Re: [VOTE 2] Board resolution for Nutch as TLP

2010-04-13 Thread Sami Siren
On 04/12/2010 02:08 PM, Andrzej Bialecki wrote: Hi, Take two, after s/crawling/search/ ... Following the discussion, below is the text of the proposed Board Resolution to vote upon. [X] +1. Request the Board make Nutch a TLP -- Sami Siren

Re: [DISCUSS] Board resolution for Nutch as TLP

2010-04-10 Thread Sami Siren
Looks good to me after the proposed changes. -- Sami Siren On Sat, Apr 10, 2010 at 6:09 PM, Andrzej Bialecki a...@getopt.org wrote: On 2010-04-10 15:32, Jukka Zitting wrote: Hi, On Fri, Apr 9, 2010 at 6:52 PM, Andrzej Bialecki a...@getopt.org wrote: WHEREAS, the Board of Directors deems

Re: [DISCUSS] Nutch as a top level project (TLP)?

2010-03-23 Thread Sami Siren
and there is not much overlap with dev communities. -- Sami Siren

[jira] Commented: (NUTCH-798) Upgrade to SOLR1.4

2010-03-10 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843546#action_12843546 ] Sami Siren commented on NUTCH-798: -- +1 Upgrade to SOLR1.4

Re: need advice trouble shooting zero results problem

2010-02-18 Thread Sami Siren
easier to diagnose situations like that. -- Sami Siren On Fri, Feb 19, 2010 at 5:24 AM, Jesse Hires jhi...@gmail.com wrote: I am getting zero results when I search, but have no idea where to look for clues as to why. Is there a log that shows failure to find search-servers.txt, or failures

[jira] Created: (NUTCH-793) search.jsp compile errors

2010-02-15 Thread Sami Siren (JIRA)
search.jsp compile errors - Key: NUTCH-793 URL: https://issues.apache.org/jira/browse/NUTCH-793 Project: Nutch Issue Type: Bug Components: web gui Reporter: Sami Siren Assignee: Sami

Re: exception in search.jsp

2010-02-15 Thread Sami Siren
Hi Jesse, thanks for spotting this. I fixed the problem in trunk, see https://issues.apache.org/jira/browse/NUTCH-793 -- Sami Siren Jesse Hires wrote: I am seeing the following and am able to find any notes anywhere on it. org.apache.jasper.JasperException: Unable to compile class for JSP

[jira] Resolved: (NUTCH-793) search.jsp compile errors

2010-02-15 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-793. -- Resolution: Fixed committed a fix search.jsp compile errors

[jira] Resolved: (NUTCH-788) search.jsp typo causing searches to fail

2010-02-15 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-788. -- Resolution: Fixed Fix Version/s: 1.1 Assignee: Sami Siren Thanks Sammy for the fix, I

[jira] Commented: (NUTCH-789) Improvements to Tika parser

2010-02-15 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833714#action_12833714 ] Sami Siren commented on NUTCH-789: -- It would be really useful to include the improvements

[jira] Created: (NUTCH-790) Some external javadoc links are broken

2010-02-14 Thread Sami Siren (JIRA)
Siren Assignee: Sami Siren Priority: Trivial Nutch javadoc links for lucene and hadoop are broken. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.

[jira] Updated: (NUTCH-790) Some external javadoc links are broken

2010-02-14 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren updated NUTCH-790: - Attachment: NUTCH-790.patch proposed patch, fixes links for lucene and hadoop, also updates j2se link

[jira] Created: (NUTCH-791) External links for published javadocs are partially broken

2010-02-14 Thread Sami Siren (JIRA)
: documentation Reporter: Sami Siren Lucene and Hadoop links point to non existing urls. For some versions of apidocs the links are just broken and for some they do not exist at all. Basically what is required is that the javadocs are generated again with proper urls for external

[jira] Resolved: (NUTCH-790) Some external javadoc links are broken

2010-02-14 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-790. -- Resolution: Fixed Fix Version/s: 1.1 committed Some external javadoc links are broken

[jira] Updated: (NUTCH-792) Nutch version still contains 1.0

2010-02-14 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren updated NUTCH-792: - Attachment: NUTCH-792.patch pump version to 1.1-dev Nutch version still contains 1.0

[jira] Created: (NUTCH-792) Nutch version still contains 1.0

2010-02-14 Thread Sami Siren (JIRA)
Nutch version still contains 1.0 Key: NUTCH-792 URL: https://issues.apache.org/jira/browse/NUTCH-792 Project: Nutch Issue Type: Task Components: build Reporter: Sami Siren

[jira] Resolved: (NUTCH-792) Nutch version still contains 1.0

2010-02-14 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-792. -- Resolution: Fixed committed Nutch version still contains 1.0

[jira] Commented: (NUTCH-766) Tika parser

2010-02-10 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832406#action_12832406 ] Sami Siren commented on NUTCH-766: -- I suggest that we would still drive this a bit further

[jira] Updated: (NUTCH-766) Tika parser

2010-02-10 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren updated NUTCH-766: - Attachment: NutchTikaConfig.java Extended TikaConfig that is able to load parsers and can be used

[jira] Updated: (NUTCH-766) Tika parser

2010-02-10 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren updated NUTCH-766: - Attachment: TikaParser.java Modified parser that can process package formats too. To get rid of the mime

[jira] Commented: (NUTCH-673) Upgrade the Carrot2 plug-in to release 3.0

2010-02-05 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830053#action_12830053 ] Sami Siren commented on NUTCH-673: -- {quote} Any plans or reasons not to upgrade to Lucene

[jira] Commented: (NUTCH-781) Update Tika to v0.6 for the MimeType detection

2010-02-02 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828561#action_12828561 ] Sami Siren commented on NUTCH-781: -- {quote} the version we had was the same as the one

[jira] Resolved: (NUTCH-775) Enhance Searcher interface

2010-02-01 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-775. -- Resolution: Fixed I committed this Enhance Searcher interface

[jira] Commented: (NUTCH-781) Update Tika to v0.6 for the MimeType detection

2010-02-01 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828275#action_12828275 ] Sami Siren commented on NUTCH-781: -- did you forgot to update conf/tika-mimetypes.xml

[jira] Commented: (NUTCH-775) Enhance Searcher interface

2010-01-28 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806019#action_12806019 ] Sami Siren commented on NUTCH-775: -- If there are no objections I'll commit the proposed

[jira] Commented: (NUTCH-775) Enhance Searcher interface

2010-01-28 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806051#action_12806051 ] Sami Siren commented on NUTCH-775: -- {quote}IMHO this could go as it is ... one suggestion

[jira] Commented: (NUTCH-766) Tika parser

2010-01-27 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805661#action_12805661 ] Sami Siren commented on NUTCH-766: -- {quote} Sure, it's more of a configuration backwards

[jira] Commented: (NUTCH-766) Tika parser

2010-01-25 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12804448#action_12804448 ] Sami Siren commented on NUTCH-766: -- +1, I'm going to agree on this one here Julien. Other

[jira] Commented: (NUTCH-766) Tika parser

2010-01-22 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803664#action_12803664 ] Sami Siren commented on NUTCH-766: -- I took a brief look into the proposed patch, some

[jira] Commented: (NUTCH-766) Tika parser

2010-01-22 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803673#action_12803673 ] Sami Siren commented on NUTCH-766: -- Sure, but it would be silly to block the whole Tika

[jira] Updated: (NUTCH-775) Enhance Searcher interface

2009-12-30 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren updated NUTCH-775: - Attachment: NUTCH-775.patch I ended up changing the Query API instead since the changes were smaller from

[jira] Commented: (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2009-12-16 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791829#action_12791829 ] Sami Siren commented on NUTCH-666: -- We should also consider switching to Tika for language

[jira] Created: (NUTCH-775) Enhance Searcher interface

2009-12-15 Thread Sami Siren (JIRA)
Enhance Searcher interface -- Key: NUTCH-775 URL: https://issues.apache.org/jira/browse/NUTCH-775 Project: Nutch Issue Type: Improvement Components: searcher Reporter: Sami Siren

[jira] Resolved: (NUTCH-743) Site search powered by Lucene/Solr

2009-07-02 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-743. -- Resolution: Fixed committed Site search powered by Lucene/Solr

[jira] Created: (NUTCH-743) Site search powered by Lucene/Solr

2009-06-23 Thread Sami Siren (JIRA)
Siren Assignee: Sami Siren Priority: Minor Replace current Nutch site search with Lucene/Solr powered search hosted by Lucid Imagination (http://www.lucidimagination.com/search). It allows one to search all of the Nutch (content from other parts of the Lucene ecosystem

[jira] Updated: (NUTCH-743) Site search powered by Lucene/Solr

2009-06-23 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren updated NUTCH-743: - Attachment: NUTCH-743.patch If there are no objections I will commit this within a week or so. Site

[ANNOUNCE] Apache Nutch 1.0

2009-03-28 Thread Sami Siren
information on Apache Nutch, visit the project home page: http://lucene.apache.org/nutch -- Sami Siren (on behalf of the Apache Nutch community)

Re: [VOTE] Release Apache Nutch 1.0

2009-03-27 Thread Sami Siren
Thanks Andrzej, This vote has passed, we now have a release with three binding +1 votes from: -Andrzej Bialecki -Dennis Kubes -Sami Siren I'll finalize the remaining tasks and do the announcement after the package has been mirrored. ps. we should perhaps create jira issues for all

[jira] Updated: (NUTCH-730) NPE in LinkRank if no nodes with which to create the WebGraph

2009-03-27 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren updated NUTCH-730: - Fix Version/s: (was: 1.0.0) NPE in LinkRank if no nodes with which to create the WebGraph

[jira] Resolved: (NUTCH-722) Nutch contains jars that we cannot redistribute

2009-03-23 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-722. -- Resolution: Fixed removed the jars and added note about this in README.txt Nutch contains jars

NUTCH-722 is resolved

2009-03-23 Thread Sami Siren
I think we are good to go for rc2 and it also seems that the smartest thing to do with the package contents at this point is do not touch them. I will roll out the new rc later today. -- Sami Siren

[VOTE] Release Apache Nutch 1.0

2009-03-23 Thread Sami Siren
. The vote passes if at least three binding +1 votes are cast. [ ] +1 Release the packages as Apache Nutch 1.0 [ ] -1 Do not release the packages because... Here's my +1 Thanks! [1] http://svn.apache.org/viewvc/lucene/nutch/tags/release-1.0-rc2/CHANGES.txt?revision=757511 -- Sami Siren

[jira] Commented: (NUTCH-728) Improve nutch release packaging

2009-03-20 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683814#action_12683814 ] Sami Siren commented on NUTCH-728: -- not really, it just happens to be the mirror I use

Re: [VOTE] Release Apache Nutch 1.0

2009-03-19 Thread Sami Siren
is can be seen from: http://www.lucidimagination.com/search/document/33b2a26db25db492/vote_release_apache_nutch_1_0 We (as a Nutch community) would really appreciate if somebody from the PMC had the time to check it out. Thanks for your time, Sami Siren Sami Siren wrote: We're lacking one +1

Re: [VOTE] Release Apache Nutch 1.0

2009-03-19 Thread Sami Siren
thanks Jukka, Jukka Zitting wrote: Hi, On Thu, Mar 19, 2009 at 10:32 AM, Sami Siren ssi...@gmail.com wrote: We (as a Nutch community) would really appreciate if somebody from the PMC had the time to check it out. -1 The release contains the Java Advanced Imaging libraries (jai_core.jar

[jira] Created: (NUTCH-722) Nutch contains jars that we cannot redistribute

2009-03-19 Thread Sami Siren (JIRA)
Nutch contains jars that we cannot redistribute --- Key: NUTCH-722 URL: https://issues.apache.org/jira/browse/NUTCH-722 Project: Nutch Issue Type: Bug Reporter: Sami Siren

[jira] Created: (NUTCH-723) LICENCE.txt is lacking info that should be there

2009-03-19 Thread Sami Siren (JIRA)
Versions: 1.0.0 Reporter: Sami Siren Jukkas comment from email: * The LICENSE.txt file should have at least references to the licenses of the bundled libraries. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.

[jira] Created: (NUTCH-725) NOTICE.txt is lacking info that should be there

2009-03-19 Thread Sami Siren (JIRA)
Reporter: Sami Siren Jukkas comment from email: * The NOTICE.txt file should start with the the following lines: Apache Nutch Copyright 2009 The Apache Software Foundation * The NOTICE.txt file should contain the required copyright notices from all bundled libraries

[jira] Created: (NUTCH-726) README.txt is lacking info that should be there

2009-03-19 Thread Sami Siren (JIRA)
Versions: 1.0.0 Reporter: Sami Siren from Jukkas email: * The README.txt should start with Apache Nutch instead of Nutch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.

[jira] Created: (NUTCH-727) Add KEYS file to release artifact

2009-03-19 Thread Sami Siren (JIRA)
Add KEYS file to release artifact - Key: NUTCH-727 URL: https://issues.apache.org/jira/browse/NUTCH-727 Project: Nutch Issue Type: Bug Affects Versions: 1.0.0 Reporter: Sami Siren comment

[DISCUSS] contents of nutch release artifact

2009-03-19 Thread Sami Siren
Jukka Zitting was suggesting we should rethink the Nutch release packaging because of it's size. I don't see this as a blocker for 1.0 but we could perhaps start the discussion about this anyway so throw in your opinions... the related snippet from email discussion: Sami Siren wrote

[jira] Resolved: (NUTCH-726) README.txt is lacking info that should be there

2009-03-19 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-726. -- Resolution: Fixed Fix Version/s: 1.0.0 committed README.txt is lacking info that should

[jira] Resolved: (NUTCH-724) Drop the JAI libraries

2009-03-19 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-724. -- Resolution: Duplicate Drop the JAI libraries -- Key: NUTCH-724

[jira] Commented: (NUTCH-722) Nutch contains jars that we cannot redistribute

2009-03-19 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683482#action_12683482 ] Sami Siren commented on NUTCH-722: -- +1, i am fine with this solution too Nutch contains

Re: [DISCUSS] contents of nutch release artifact

2009-03-19 Thread Sami Siren
Andrzej Bialecki wrote: Sami Siren wrote: Jukka Zitting was suggesting we should rethink the Nutch release packaging because of it's size. I don't see this as a blocker for 1.0 but we could perhaps start the discussion about this anyway so throw in your opinions... I agree with you

Re: [DISCUSS] contents of nutch release artifact

2009-03-19 Thread Sami Siren
) :) -- Sami Siren

Re: [DISCUSS] contents of nutch release artifact

2009-03-19 Thread Sami Siren
, no .svn (equivalent to svn export), simple tgz. this sounds good to me. additionally some new documentation needs to be written too. -- Sami Siren

[jira] Resolved: (NUTCH-725) NOTICE.txt is lacking info that should be there

2009-03-19 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-725. -- Resolution: Fixed went through the libs and added copyright notices NOTICE.txt is lacking info

[jira] Resolved: (NUTCH-723) LICENCE.txt is lacking info that should be there

2009-03-19 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-723. -- Resolution: Fixed added licenses of 4rd party software LICENCE.txt is lacking info that should

[jira] Issue Comment Edited: (NUTCH-723) LICENCE.txt is lacking info that should be there

2009-03-19 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683618#action_12683618 ] Sami Siren edited comment on NUTCH-723 at 3/19/09 2:11 PM: --- added

[jira] Updated: (NUTCH-728) Improve nutch release packaging

2009-03-19 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren updated NUTCH-728: - Attachment: NUTCH-728.patch add simple target to generate source release tgz from svn tag -did not touch

[jira] Commented: (NUTCH-722) Nutch contains jars that we cannot redistribute

2009-03-19 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683634#action_12683634 ] Sami Siren commented on NUTCH-722: -- if there are no objections I will commit this change

Re: [DISCUSS] contents of nutch release artifact

2009-03-19 Thread Sami Siren
Sami Siren wrote: Andrzej Bialecki wrote: How about the following: we build just 2 packages: * binary: this includes only base hadoop libs in lib/ (enough to start a local job, no optional filesystems etc), the *.job and *.war files and scripts. Scripts would check for the presence

Re: [VOTE] Release Apache Nutch 1.0

2009-03-15 Thread Sami Siren
of (at least not pgp.mit.edu). http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x0B7E6CFA -- Sami Siren On Mar 11, 2009, at 10:13 AM, Andrzej Bialecki wrote: Sami Siren wrote: Hello, I have packaged the second release candidate for Apache Nutch 1.0 release at http://people.apache.org/~siren

Re: [VOTE] Release Apache Nutch 1.0

2009-03-10 Thread Sami Siren
This vote has been cancelled due to some last minute additions. I will post another RC soon. Sami Siren wrote: -- Sami Siren Hello, I have packaged the first release candidate for Apache Nutch 1.0 release at http://people.apache.org/~siren/nutch-1.0/rc0/ See the included CHANGES.txt file

Re: Nutch ML cleanup

2009-03-10 Thread Sami Siren
Like I suspected: I have no power to do or view any admin stuff there. Btw. I am not seeing any span, perhaps google takes care of that for me? -- Sami Siren Sami Siren wrote: I'll take a look at this, I am pretty sure we have to ask Doug at the end :) -- Sami Siren Otis Gospodnetic wrote

[jira] Resolved: (NUTCH-715) Subcollection plugin doesn't work with default subcollections.xml file

2009-03-10 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-715. -- Resolution: Fixed committed, thanks Dmitry! Subcollection plugin doesn't work with default

[VOTE] Release Apache Nutch 1.0

2009-03-10 Thread Sami Siren
if at least three binding +1 votes are cast. [ ] +1 Release the packages as Apache Nutch 1.0 [ ] -1 Do not release the packages because... Here's my +1 Thanks! [1] *http://svn.apache.org/viewvc/lucene/nutch/tags/release-1.0-rc1/CHANGES.txt?view=logpathrev=752004 *-- Sami Siren

Re: [VOTE] Release Apache Nutch 1.0

2009-03-10 Thread Sami Siren
/lucene/nutch/tags/release-1.0-rc1/CHANGES.txt?view=logpathrev=752004 -- Sami Siren

[jira] Commented: (NUTCH-705) parse-rtf plugin

2009-03-10 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12680411#action_12680411 ] Sami Siren commented on NUTCH-705: -- I think we should start looking at Apache Tika for most

[jira] Created: (NUTCH-717) Make Nutch Solr integration easier

2009-03-10 Thread Sami Siren (JIRA)
Make Nutch Solr integration easier -- Key: NUTCH-717 URL: https://issues.apache.org/jira/browse/NUTCH-717 Project: Nutch Issue Type: New Feature Reporter: Sami Siren Fix For: 1.1

Re: Moving Nutch parsers to Tika

2009-03-10 Thread Sami Siren
that is totally missing from Tika is swf (https://issues.apache.org/jira/browse/TIKA-147). Tika also supports some formats that Nutch currently does not (in addition to providing more advanced parsing on some formats). -- Sami Siren

NUTCH-684 [was: Re: [VOTE] Release Apache Nutch 1.0]

2009-03-09 Thread Sami Siren
Dog(acan Güney wrote: On Sun, Mar 8, 2009 at 20:25, Sami Siren ssi...@gmail.com wrote: Hello, I have packaged the first release candidate for Apache Nutch 1.0 release at http://people.apache.org/~siren/nutch-1.0/rc0/ See the included CHANGES.txt file for details on release contents

Re: NUTCH-684 [was: Re: [VOTE] Release Apache Nutch 1.0]

2009-03-09 Thread Sami Siren
Doğacan Güney wrote: On 09.Mar.2009, at 11:05, Sami Siren ssi...@gmail.com mailto:ssi...@gmail.com wrote: Doğacan Güney wrote: On Sun, Mar 8, 2009 at 20:25, Sami Siren ssi...@gmail.com mailto:ssi...@gmail.com wrote: Hello, I have packaged the first release candidate for Apache Nutch

[VOTE] Release Apache Nutch 1.0

2009-03-08 Thread Sami Siren
if at least three binding +1 votes are cast. [ ] +1 Release the packages as Apache Nutch 1.0 [ ] -1 Do not release the packages because... Thanks! -- Sami Siren

Re: planning for nutch-1.0-rc1

2009-03-05 Thread Sami Siren
I am sure all of you noticed that the release planned to be cut during this week was delayed because of a new discovery right before the deadline (NUTCH-711). That has now been fixed so it's time to move on. I am now going to build the first RC during the weekend. -- Sami Siren Sami Siren

[jira] Commented: (NUTCH-711) Indexer failing after upgrade to Hadoop 0.19.1

2009-03-04 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12678691#action_12678691 ] Sami Siren commented on NUTCH-711: -- +1 Indexer failing after upgrade to Hadoop 0.19.1

Re: [jira] Resolved: (NUTCH-711) Indexer failing after upgrade to Hadoop 0.19.1

2009-03-04 Thread Sami Siren
Alternatively you could create another issue to track the proper fix and let this close during the release process. -- Sami Siren Andrzej Bialecki (JIRA) wrote: [ https://issues.apache.org/jira/browse/NUTCH-711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

[jira] Updated: (NUTCH-700) Neko1.9.11 goes into a loop

2009-03-02 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren updated NUTCH-700: - Fix Version/s: 1.0.0 Assignee: Sami Siren This one just bit me - the effect is that parsing

Re: planning for nutch-1.0-rc1

2009-03-02 Thread Sami Siren
Andrzej Bialecki wrote: Sami Siren wrote: I am planning to build the first rc for nutch 1.0 at Tue 3.3.2009 morning (EET). There are still some issues marked as fix for 1.0 in Jira. Neither of the two remaining _bugs_ seems too important to me, actually I only count the issues assigned

[jira] Resolved: (NUTCH-700) Neko1.9.11 goes into a loop

2009-03-02 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-700. -- Resolution: Fixed reverted to 0.9.4 Neko1.9.11 goes into a loop

[jira] Resolved: (NUTCH-669) Consolidate code for Fetcher and Fetcher2

2009-03-02 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-669. -- Resolution: Fixed replaced fetcher with fetcher2 Consolidate code for Fetcher and Fetcher2

Re: [jira] Resolved: (NUTCH-669) Consolidate code for Fetcher and Fetcher2

2009-03-02 Thread Sami Siren
Andrzej Bialecki wrote: Sami Siren (JIRA) wrote: [ https://issues.apache.org/jira/browse/NUTCH-669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-669. -- Resolution: Fixed replaced fetcher with fetcher2

Re: Release 1.0?

2009-02-28 Thread Sami Siren
. -- Sami Siren -- Sami Siren Thanks. Andrzej Bialecki wrote: Marko Bauhardt wrote: Hi, is there anybody out there? ;) exists a plan when version 1.0 will be released? thanks marko On Jan 28, 2009, at 9:45 AM, Marko Bauhardt wrote: Hi all, is there a timeline

Re: Release 1.0?

2009-02-28 Thread Sami Siren
Sami Siren wrote: I think that no one else but me made any guesses about the release date? (since it is virtually impossible due to fact that this is not a paid project). Andrzej Bialecki wrote: We do exist. ;) We plan to release in February - I can't tell you yet when exactly, we

planning for nutch-1.0-rc1

2009-02-28 Thread Sami Siren
in 1.0: NUTCH-578 (kubes) NUTCH-477 (ab) NUTCH-669 (siren) I am also volunteering to push all open issues to 1.1 before starting the RC build on Tuesday. Any objections on the proposed procedure or timing? -- Sami Siren

[jira] Commented: (NUTCH-705) parse-rtf plugin

2009-02-27 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677508#action_12677508 ] Sami Siren commented on NUTCH-705: -- I think that the patch contains some lgpl code that we

Re: Url regex normalizer

2009-02-27 Thread Sami Siren
some junit test to verify it behaves as expected? -- Sami Siren

[jira] Resolved: (NUTCH-699) Add an official solr schema for solr integration

2009-02-26 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-699. -- Resolution: Fixed committed Add an official solr schema for solr integration

[jira] Assigned: (NUTCH-669) Consolidate code for Fetcher and Fetcher2

2009-02-26 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren reassigned NUTCH-669: Assignee: Sami Siren Consolidate code for Fetcher and Fetcher2

[jira] Commented: (NUTCH-703) Upgrade to Hadoop 0.19.1

2009-02-26 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677266#action_12677266 ] Sami Siren commented on NUTCH-703: -- Andrzej, are you working with this now? Upgrade

[jira] Resolved: (NUTCH-247) robot parser to restrict.

2009-02-24 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-247. -- Resolution: Fixed Assignee: Sami Siren (was: Dennis Kubes) committed this - added checking

[jira] Created: (NUTCH-701) replace Fetcher with Fetcher2

2009-02-24 Thread Sami Siren (JIRA)
replace Fetcher with Fetcher2 - Key: NUTCH-701 URL: https://issues.apache.org/jira/browse/NUTCH-701 Project: Nutch Issue Type: Bug Components: fetcher Reporter: Sami Siren

[jira] Updated: (NUTCH-701) Replace Fetcher with Fetcher2

2009-02-24 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren updated NUTCH-701: - Summary: Replace Fetcher with Fetcher2 (was: replace Fetcher with Fetcher2) Replace Fetcher

[jira] Resolved: (NUTCH-698) CrawlDb is corrupted after a few crawl cycles

2009-02-24 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-698. -- Resolution: Fixed committed. thanks guys CrawlDb is corrupted after a few crawl cycles

[jira] Commented: (NUTCH-699) Add an official solr schema for solr integration

2009-02-24 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12676233#action_12676233 ] Sami Siren commented on NUTCH-699: -- We could put it under conf/ ? Add an official solr

[jira] Resolved: (NUTCH-701) Replace Fetcher with Fetcher2

2009-02-24 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-701. -- Resolution: Duplicate Replace Fetcher with Fetcher2

[jira] Updated: (NUTCH-669) Consolidate code for Fetcher and Fetcher2

2009-02-24 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren updated NUTCH-669: - Fix Version/s: (was: 1.1) 1.0.0 Moving this back to 1.0 Are you close with your

[jira] Resolved: (NUTCH-694) Distributed Search Server fails

2009-02-22 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-694. -- Resolution: Fixed Committed. Thanks for testing it. Distributed Search Server fails

[jira] Commented: (NUTCH-477) Extend URLFilters to support different filtering chains

2009-02-22 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12675793#action_12675793 ] Sami Siren commented on NUTCH-477: -- It's your call. IMO the whole URLFIlters - URLFIlter

  1   2   3   4   5   >