Re: [VOTE] Apache Nutch 1.5.1 Release Candidate

2012-06-26 Thread Mattmann, Chris A (388J)
Hey Markus, Don't beat yourself up over it -- you did awesome work and have been contributing a ton so who cares! If we need to do another patch release, we can easily do it (especially with super release guy Lewis!) Cheers, Chris On Jun 26, 2012, at 3:55 PM, Markus Jelsma wrote: > Hi, > >

RE: [VOTE] Apache Nutch 1.5.1 Release Candidate

2012-06-26 Thread Markus Jelsma
Hi, The HostURLNormalizer is not supposed to be in 1.5.1, this is true for other issues as well. Nutch 1.5.1 is a bugfix release and should not be pulled from trunk but from the tag + the required patches, i didn't notice it was pulled from trunk until now. The build issue has for that plugin

Re: [VOTE] Apache Nutch 1.5.1 Release Candidate

2012-06-26 Thread Sebastian Nagel
-1 The plugin urlnormalizer-host (NUTCH-1319 listed in CHANGES.txt) is missing in the bin package. It also does not build for the src package: it's missing in src/plugins/build.xml of 1.5.1. @Markus: You are right: up to 1.4 there was a top-level folder apache-nutch-1.x/ in the package (src and bi

Re: [VOTE] Apache Nutch 1.5.1 Release Candidate

2012-06-26 Thread Julien Nioche
OK, JIRA and fix for 1.6? On 26 June 2012 17:32, Markus Jelsma wrote: > This was command line. I didn't notive it with 1.5 because i unpacked that > in a GUI. It really unpacks in the cwd, or my system makes a fool out of me > :) > > wget > http://people.apache.org/~lewismc/apache-nutch-1.5.1-rc

RE: [VOTE] Apache Nutch 1.5.1 Release Candidate

2012-06-26 Thread Markus Jelsma
This was command line. I didn't notive it with 1.5 because i unpacked that in a GUI. It really unpacks in the cwd, or my system makes a fool out of me :) wget http://people.apache.org/~lewismc/apache-nutch-1.5.1-rc1/apache-nutch-1.5.1-src.tar.gz tar -xvzf apache-nutch-1.5.1-src.tar.gz ls apach

Re: problem when fetching with http-client and authentication

2012-06-26 Thread nutch.bu...@gmail.com
Thanks lewis, but I don't think its related to NUTCH-769. >From what I understand of NUTCH-769, it concerns scenarios in which the hosts are indeed unresponsive and an exception is thrown on same url over and over. My problem here is with protocol-httpclient. The urls and hosts are responsive, but

Re: [VOTE] Apache Nutch 1.5.1 Release Candidate

2012-06-26 Thread Julien Nioche
Probably depends on the tool you are using to open the archive. It does that with File Roller on Ubuntu but works fine on the command line or when doing "extract here" from the file menu Not a blocker IMHO On 26 June 2012 08:04, Markus Jelsma wrote: > Hi, > > It builds and runs smoothly but the

Re: problem when fetching with http-client and authentication

2012-06-26 Thread Lewis John Mcgibbney
Hi, On Tue, Jun 26, 2012 at 1:39 PM, nutch.bu...@gmail.com wrote: > after a while fetcher starts throwing > httpclient.connectionPoolTimeoutException: Timeout waiting for connection > for almost each url. > > Any solution for this issue? This looks like it's related to the fix in NUTCH-769 can y

Re: [VOTE] Apache Nutch 1.5.1 Release Candidate

2012-06-26 Thread Lewis John Mcgibbney
Hi Markus, I've just unpacked both the src.zip and src.tar.gz and they both create a directory apache-nutch-1.5.1-src with everything inside... is this what you require? Lewis On Tue, Jun 26, 2012 at 8:04 AM, Markus Jelsma wrote: > Hi, > > It builds and runs smoothly but there's something that

Re: parse and solrindex in nutch-2.0

2012-06-26 Thread Julien Nioche
update (or whatever the actual name of the command is) after parsing? On 25 June 2012 22:35, wrote: > Hello, > > I have tested nutch-2.0 with hbase and mysql trying to index only one url > with depth 1. > > I tried to fetch an html tag value and parse it to metadata column in > webpage object b

RE: [VOTE] Apache Nutch 1.5.1 Release Candidate

2012-06-26 Thread Markus Jelsma
Hi, It builds and runs smoothly but there's something that didn't catch my eye with 1.5 since i then used a GUI to unpack the src file, the src and bin packages decompresses everything in the cwd, this means no apache-nutch-1.5 folder is created. This was the case with 1.4 and earlier. I believ