Re: Persistent problems with Ivy dependencies in Eclipse
Just for reference, I've sorted all the problems I wasx having with my build environment and have updated the tutorial on our wiki. I've also commented on your issue Kirby. Thanks for the pointer. Lewis On Fri, Nov 11, 2011 at 1:37 PM, Kirby Bohling kirby.bohl...@gmail.comwrote: Lewis, https://issues.apache.org/jira/browse/NUTCH-1068 That is the issue I filed about the patch (it isn't directly related to this, but it is related to some potential fixes). http://www.mail-archive.com/dev%40nutch.apache.org/msg03419.html That's the e-mail thread where I originally mentioned the modifications to automaton, and the patch with the backport of the Lucene fixes. Kirby On Fri, Nov 11, 2011 at 11:58 AM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Excellent Kirby, thanks for this. The obvious question I guess... where does this leave us with regards to the urlfilter-automation libraries? For the record as well, can you please provide the Jira you filed, it would be good to know where I can begin with this one. Thanks On Thu, Nov 10, 2011 at 10:18 PM, Kirby Bohling kirby.bohl...@gmail.com wrote: On Thu, Nov 10, 2011 at 6:14 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: OK so the required dependencies can be seen below - FeedParser dependency org=net.java.dev.rome name=rome rev=1.0.0 conf=*-master/ - URLAutomationFilter - dependency org=dk.brics name=automaton rev=???/ - SWFParser dependency org=com.google.gwt name=gwt-incubator rev=2.0.1/ - HTMLParser dependency org=net.sourceforge.nekohtml name=nekohtml rev=1.9.15/ There is a real nasty hack which would replace the usual ${nutch.root} with include file=../../../ivy/ivy-configurations.xml/ is possible, however this is not how I want to progress. I'm also not sure where to find the dk.brics dependency. The Automaton library to the best of my knowledge is not available via Maven's central repo. http://www.brics.dk/automaton/ is the site where you and find it. That's the location of the actual jar. http://www.brics.dk/automaton/automaton.jar In order to get the source you have to submit an e-mail address, but it is all available under the newer BSD/MIT license. I believe all of the functionality actually used by Nutch is in a faster form buried inside the Lucene Util library 4.0 (unreleased last I knew). I believe I filed an JIRA issue about my backport of the Lucene improvements to the library at Julian's request. I have submitted the code to the author, but I'm not sure if he has integrated it. He was short on time when I submitted all of it. It is a nice library, but it isn't very 3rd party user friendly (no bug tracker, no public source repo). Kirby Any thoughts? Jira issue? Thanks On Thu, Nov 10, 2011 at 12:39 AM, Andrzej Bialecki a...@getopt.org wrote: On 10/11/2011 04:39, Lewis John Mcgibbney wrote: Gets even more strange, both SWFParser and AutomationURLFilter import additonal depenedencies, however they are not included within thier plugin/ivy/ivy.xml files! Am I missing something here? Most likely these problems come from the initial porting of a pure ant build to an ant+ivy build. We should determine what deps are really needed by these plugins, and sanitize the ivy.xml files so that they make sense - if the existing files can't be untangled we can ditch them and come up with new, clean ones. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com -- Lewis -- Lewis -- *Lewis*
Re: Persistent problems with Ivy dependencies in Eclipse
Excellent Kirby, thanks for this. The obvious question I guess... where does this leave us with regards to the urlfilter-automation libraries? For the record as well, can you please provide the Jira you filed, it would be good to know where I can begin with this one. Thanks On Thu, Nov 10, 2011 at 10:18 PM, Kirby Bohling kirby.bohl...@gmail.comwrote: On Thu, Nov 10, 2011 at 6:14 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: OK so the required dependencies can be seen below - FeedParser dependency org=net.java.dev.rome name=rome rev=1.0.0 conf=*-master/ - URLAutomationFilter - dependency org=dk.brics name=automaton rev=???/ - SWFParser dependency org=com.google.gwt name=gwt-incubator rev=2.0.1/ - HTMLParser dependency org=net.sourceforge.nekohtml name=nekohtml rev=1.9.15/ There is a real nasty hack which would replace the usual ${nutch.root} with include file=../../../ivy/ivy-configurations.xml/ is possible, however this is not how I want to progress. I'm also not sure where to find the dk.brics dependency. The Automaton library to the best of my knowledge is not available via Maven's central repo. http://www.brics.dk/automaton/ is the site where you and find it. That's the location of the actual jar. http://www.brics.dk/automaton/automaton.jar In order to get the source you have to submit an e-mail address, but it is all available under the newer BSD/MIT license. I believe all of the functionality actually used by Nutch is in a faster form buried inside the Lucene Util library 4.0 (unreleased last I knew). I believe I filed an JIRA issue about my backport of the Lucene improvements to the library at Julian's request. I have submitted the code to the author, but I'm not sure if he has integrated it. He was short on time when I submitted all of it. It is a nice library, but it isn't very 3rd party user friendly (no bug tracker, no public source repo). Kirby Any thoughts? Jira issue? Thanks On Thu, Nov 10, 2011 at 12:39 AM, Andrzej Bialecki a...@getopt.org wrote: On 10/11/2011 04:39, Lewis John Mcgibbney wrote: Gets even more strange, both SWFParser and AutomationURLFilter import additonal depenedencies, however they are not included within thier plugin/ivy/ivy.xml files! Am I missing something here? Most likely these problems come from the initial porting of a pure ant build to an ant+ivy build. We should determine what deps are really needed by these plugins, and sanitize the ivy.xml files so that they make sense - if the existing files can't be untangled we can ditch them and come up with new, clean ones. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com -- Lewis -- *Lewis*
Re: Persistent problems with Ivy dependencies in Eclipse
Lewis, https://issues.apache.org/jira/browse/NUTCH-1068 That is the issue I filed about the patch (it isn't directly related to this, but it is related to some potential fixes). http://www.mail-archive.com/dev%40nutch.apache.org/msg03419.html That's the e-mail thread where I originally mentioned the modifications to automaton, and the patch with the backport of the Lucene fixes. Kirby On Fri, Nov 11, 2011 at 11:58 AM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Excellent Kirby, thanks for this. The obvious question I guess... where does this leave us with regards to the urlfilter-automation libraries? For the record as well, can you please provide the Jira you filed, it would be good to know where I can begin with this one. Thanks On Thu, Nov 10, 2011 at 10:18 PM, Kirby Bohling kirby.bohl...@gmail.com wrote: On Thu, Nov 10, 2011 at 6:14 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: OK so the required dependencies can be seen below - FeedParser dependency org=net.java.dev.rome name=rome rev=1.0.0 conf=*-master/ - URLAutomationFilter - dependency org=dk.brics name=automaton rev=???/ - SWFParser dependency org=com.google.gwt name=gwt-incubator rev=2.0.1/ - HTMLParser dependency org=net.sourceforge.nekohtml name=nekohtml rev=1.9.15/ There is a real nasty hack which would replace the usual ${nutch.root} with include file=../../../ivy/ivy-configurations.xml/ is possible, however this is not how I want to progress. I'm also not sure where to find the dk.brics dependency. The Automaton library to the best of my knowledge is not available via Maven's central repo. http://www.brics.dk/automaton/ is the site where you and find it. That's the location of the actual jar. http://www.brics.dk/automaton/automaton.jar In order to get the source you have to submit an e-mail address, but it is all available under the newer BSD/MIT license. I believe all of the functionality actually used by Nutch is in a faster form buried inside the Lucene Util library 4.0 (unreleased last I knew). I believe I filed an JIRA issue about my backport of the Lucene improvements to the library at Julian's request. I have submitted the code to the author, but I'm not sure if he has integrated it. He was short on time when I submitted all of it. It is a nice library, but it isn't very 3rd party user friendly (no bug tracker, no public source repo). Kirby Any thoughts? Jira issue? Thanks On Thu, Nov 10, 2011 at 12:39 AM, Andrzej Bialecki a...@getopt.org wrote: On 10/11/2011 04:39, Lewis John Mcgibbney wrote: Gets even more strange, both SWFParser and AutomationURLFilter import additonal depenedencies, however they are not included within thier plugin/ivy/ivy.xml files! Am I missing something here? Most likely these problems come from the initial porting of a pure ant build to an ant+ivy build. We should determine what deps are really needed by these plugins, and sanitize the ivy.xml files so that they make sense - if the existing files can't be untangled we can ditch them and come up with new, clean ones. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com -- Lewis -- Lewis
Re: Persistent problems with Ivy dependencies in Eclipse
On 10/11/2011 04:39, Lewis John Mcgibbney wrote: Gets even more strange, both SWFParser and AutomationURLFilter import additonal depenedencies, however they are not included within thier plugin/ivy/ivy.xml files! Am I missing something here? Most likely these problems come from the initial porting of a pure ant build to an ant+ivy build. We should determine what deps are really needed by these plugins, and sanitize the ivy.xml files so that they make sense - if the existing files can't be untangled we can ditch them and come up with new, clean ones. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: Persistent problems with Ivy dependencies in Eclipse
OK so the required dependencies can be seen below - FeedParser dependency org=net.java.dev.rome name=rome rev=1.0.0 conf=*-master/ - URLAutomationFilter - dependency org=dk.brics name=automaton rev=???/ - SWFParser dependency org=com.google.gwt name=gwt-incubator rev=2.0.1/ - HTMLParser dependency org=net.sourceforge.nekohtml name=nekohtml rev=1.9.15/ There is a real nasty hack which would replace the usual ${nutch.root} with include file=../../../ivy/ivy-configurations.xml/ is possible, however this is not how I want to progress. I'm also not sure where to find the dk.brics dependency. Any thoughts? Jira issue? Thanks On Thu, Nov 10, 2011 at 12:39 AM, Andrzej Bialecki a...@getopt.org wrote: On 10/11/2011 04:39, Lewis John Mcgibbney wrote: Gets even more strange, both SWFParser and AutomationURLFilter import additonal depenedencies, however they are not included within thier plugin/ivy/ivy.xml files! Am I missing something here? Most likely these problems come from the initial porting of a pure ant build to an ant+ivy build. We should determine what deps are really needed by these plugins, and sanitize the ivy.xml files so that they make sense - if the existing files can't be untangled we can ditch them and come up with new, clean ones. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __** [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com -- *Lewis*
Re: Persistent problems with Ivy dependencies in Eclipse
On Thu, Nov 10, 2011 at 6:14 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: OK so the required dependencies can be seen below - FeedParser dependency org=net.java.dev.rome name=rome rev=1.0.0 conf=*-master/ - URLAutomationFilter - dependency org=dk.brics name=automaton rev=???/ - SWFParser dependency org=com.google.gwt name=gwt-incubator rev=2.0.1/ - HTMLParser dependency org=net.sourceforge.nekohtml name=nekohtml rev=1.9.15/ There is a real nasty hack which would replace the usual ${nutch.root} with include file=../../../ivy/ivy-configurations.xml/ is possible, however this is not how I want to progress. I'm also not sure where to find the dk.brics dependency. The Automaton library to the best of my knowledge is not available via Maven's central repo. http://www.brics.dk/automaton/ is the site where you and find it. That's the location of the actual jar. http://www.brics.dk/automaton/automaton.jar In order to get the source you have to submit an e-mail address, but it is all available under the newer BSD/MIT license. I believe all of the functionality actually used by Nutch is in a faster form buried inside the Lucene Util library 4.0 (unreleased last I knew). I believe I filed an JIRA issue about my backport of the Lucene improvements to the library at Julian's request. I have submitted the code to the author, but I'm not sure if he has integrated it. He was short on time when I submitted all of it. It is a nice library, but it isn't very 3rd party user friendly (no bug tracker, no public source repo). Kirby Any thoughts? Jira issue? Thanks On Thu, Nov 10, 2011 at 12:39 AM, Andrzej Bialecki a...@getopt.org wrote: On 10/11/2011 04:39, Lewis John Mcgibbney wrote: Gets even more strange, both SWFParser and AutomationURLFilter import additonal depenedencies, however they are not included within thier plugin/ivy/ivy.xml files! Am I missing something here? Most likely these problems come from the initial porting of a pure ant build to an ant+ivy build. We should determine what deps are really needed by these plugins, and sanitize the ivy.xml files so that they make sense - if the existing files can't be untangled we can ditch them and come up with new, clean ones. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com -- Lewis
Re: Persistent problems with Ivy dependencies in Eclipse
Say we were to replace ${nutch.root} with ${basedir} in every instance of a plugin directory that has additional dependencies over and above what we specify in NUTCH_HOME/ivy/ivy.xml. This then breaks the build. Where is the nutch.root variable actually specified? I don't know where to find it. On Wed, Nov 9, 2011 at 5:41 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi, I've been looking closely at getting a well configured Nutch Eclispe environment. I'm having trouble with the following plugins - AutomationURLFilter - SWFParser - HTMLParser - FeedParser This ties back to the fact that each of these plugins have irregular dependencies over and above what we have specificed in NUTCH_HOME/ivy/ivy.xml. To date, I can see no way of resolving these dependencies within Eclipse without specifying the depencies in NUTCH_HOME/ivy/ivy.xml, however it would be great if anyone has a work around which we could utilise!!! For completeness, I've also tried resolving the dependencies with IvyDE plugin by manually adding the above 4 ivy.xml files, however the IvyDE struggles to parse the ivy files, due to the presence of the ${nutch.root} variable. Is there scope to change this? If so to what? Thank you -- *Lewis* -- *Lewis*
Re: Persistent problems with Ivy dependencies in Eclipse
Gets even more strange, both SWFParser and AutomationURLFilter import additonal depenedencies, however they are not included within thier plugin/ivy/ivy.xml files! Am I missing something here? On Wed, Nov 9, 2011 at 6:23 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Say we were to replace ${nutch.root} with ${basedir} in every instance of a plugin directory that has additional dependencies over and above what we specify in NUTCH_HOME/ivy/ivy.xml. This then breaks the build. Where is the nutch.root variable actually specified? I don't know where to find it. On Wed, Nov 9, 2011 at 5:41 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi, I've been looking closely at getting a well configured Nutch Eclispe environment. I'm having trouble with the following plugins - AutomationURLFilter - SWFParser - HTMLParser - FeedParser This ties back to the fact that each of these plugins have irregular dependencies over and above what we have specificed in NUTCH_HOME/ivy/ivy.xml. To date, I can see no way of resolving these dependencies within Eclipse without specifying the depencies in NUTCH_HOME/ivy/ivy.xml, however it would be great if anyone has a work around which we could utilise!!! For completeness, I've also tried resolving the dependencies with IvyDE plugin by manually adding the above 4 ivy.xml files, however the IvyDE struggles to parse the ivy files, due to the presence of the ${nutch.root} variable. Is there scope to change this? If so to what? Thank you -- *Lewis* -- *Lewis* -- *Lewis*