Re: [VOTE] Release Apache Nutch 1.0

2009-03-26 Thread Doğacan Güney
So anyone else? Anyone?

On Wed, Mar 25, 2009 at 17:17, Dennis Kubes  wrote:

> +1, is this binding? :)
>
> Dog(acan Güney wrote:
>
>> Another non-binding +1 from me.
>>
>> Hope this one is a keeper :D
>>
>> On Mon, Mar 23, 2009 at 22:28, Sami Siren > ssi...@gmail.com>> wrote:
>>
>>Hello,
>>
>>I have packaged the third release candidate for Apache Nutch 1.0
>>release at 
>> http://people.apache.org/~siren/nutch-1.0/rc2/
>>
>>
>>See the CHANGES.txt[1] file for details on release contents and
>>latest changes. The release was made from tag:
>>http://svn.apache.org/viewvc/lucene/nutch/tags/release-1.0-rc2/
>>
>>The following issues that were discovered during the review of last
>>rc have been fixed:
>>
>>https://issues.apache.org/jira/browse/NUTCH-722
>>https://issues.apache.org/jira/browse/NUTCH-723
>>https://issues.apache.org/jira/browse/NUTCH-725
>>https://issues.apache.org/jira/browse/NUTCH-726
>>https://issues.apache.org/jira/browse/NUTCH-727
>>
>>Please vote on releasing this package as Apache Nutch 1.0. The vote
>>is open for the next 72 hours. Only votes from Lucene PMC members
>>are binding, but everyone is welcome to check the release candidate
>>and voice their approval or disapproval. The vote  passes if at
>>least three binding +1 votes are cast.
>>
>>[ ] +1 Release the packages as Apache Nutch 1.0
>>[ ] -1 Do not release the packages because...
>>
>>Here's my +1
>>
>>
>>Thanks!
>>
>>
>>[1]
>>
>> http://svn.apache.org/viewvc/lucene/nutch/tags/release-1.0-rc2/CHANGES.txt?revision=757511
>>--Sami Siren
>>
>>
>>
>>
>> --
>> Dog(acan Güney
>>
>


-- 
Doğacan Güney


[jira] Commented: (NUTCH-706) Url regex normalizer

2009-03-26 Thread Dmitry Lihachev (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689385#action_12689385
 ] 

Dmitry Lihachev commented on NUTCH-706:
---

I think this must be changed to 

{code:xml}

  
(\?|&)([;_]?((?i)l|j|bv_|ps_)?((?i)s|sid|phpsessid|sessionid|conversationid|sess_id)=.*?)(\?|&|#|$)
  $1$5

{code}

> Url regex normalizer
> 
>
> Key: NUTCH-706
> URL: https://issues.apache.org/jira/browse/NUTCH-706
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Meghna Kukreja
>Priority: Minor
> Fix For: 1.1
>
>
> Hey,
> I encountered the following problem while trying to crawl a site using
> nutch-trunk. In the file regex-normalize.xml, the following regex is
> used to remove session ids:
> ([;_]?((?i)l|j|bv_)?((?i)sid|phpsessid|sessionid)=.*?)(\?|&|#|$).
> This pattern also transforms a url, such as,
> "&newsId=2000484784794&newsLang=en" into "&new&newsLang=en" (since it
> matches 'sId' in the 'newsId'), which is incorrect and hence does not
> get fetched. This expression needs to be changed to prevent this.
> Thanks,
> Meghna

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.