Re: TikaEntityProcessor on Solr 1.4?
2010/5/22 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com: just copy the dih-extras jar file from the nightly should be fine Now that I've finally got a server on which to attempt to set these things up... this turns out not to be a viable solution. The extras jar does contain the TikaEntityProcessor class, but NOT the BinFileDataSource and BinURLDataSource on which it depends. I tried both replacing the 1.4 DIH jar with the one from the trunk, and adding those two specific classes to the extras jar, neither of which worked. (And I apologize, but I didn't copy down the exceptions involved; if I can find some free time, I might go back and make the attempt again, a bit more methodically.) Sixten
RE: TikaEntityProcessor on Solr 1.4?
When I wanted to add some content to the solrj wiki for glassfish, I had a problem in that their anti-spam measures broke the ability to create a new account. Someone here (Chris I think) was kind enough to create a ticket in the correct place: https://issues.apache.org/jira/browse/INFRA-2726 You can see it was very quickly solved. I am not suggesting that the problem is the same, only that this may be the correct place to create a new ticket with the problem of getting a file from the wiki and perhaps someone can help you there. Tim -Original Message- From: Sixten Otto [mailto:six...@sfko.com] Sent: Tuesday, June 08, 2010 3:53 PM To: solr-user@lucene.apache.org Subject: Re: TikaEntityProcessor on Solr 1.4? 2010/5/22 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com: just copy the dih-extras jar file from the nightly should be fine Now that I've finally got a server on which to attempt to set these things up... this turns out not to be a viable solution. The extras jar does contain the TikaEntityProcessor class, but NOT the BinFileDataSource and BinURLDataSource on which it depends. I tried both replacing the 1.4 DIH jar with the one from the trunk, and adding those two specific classes to the extras jar, neither of which worked. (And I apologize, but I didn't copy down the exceptions involved; if I can find some free time, I might go back and make the attempt again, a bit more methodically.) Sixten
Re: TikaEntityProcessor on Solr 1.4?
just copy the dih-extras jar file from the nightly should be fine On Sat, May 22, 2010 at 3:12 AM, Sixten Otto six...@sfko.com wrote: On Fri, May 21, 2010 at 5:30 PM, Chris Harris rygu...@gmail.com wrote: Actually, rather than cherry-pick just the changes from SOLR-1358 and SOLR-1583 what I did was to merge in all DataImportHandler-related changes from between the 1.4 release up through Solr trunk r890679 (inclusive). I'm not sure if that's what would work best for you, but it's one option. I'd rather, of course, not to have to build my own. But if I'm going to dabble in the source at all, it's just a slippery slope from the former to the latter. :-) (My main hesitation in doing so would be that I'm new enough to the code that I have no idea what core changes the trunk's DIH might also depend on. And my Java's pretty rusty.) How did you arrive at your patch? Just grafting the entire trunk/solr/contrib/dataimporthandler onto 1.4's code? Or did you go through Jira/SVN looking for applicable changesets? I'll be very interested to hear how your testing goes! Sixten -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: TikaEntityProcessor on Solr 1.4?
2010/5/19 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com: I guess it should work because Tika Entityprocessor does not use any new 1.4 APIs On Wed, May 19, 2010 at 1:17 AM, Sixten Otto six...@sfko.com wrote: The TikaEntityProcessor class that enables DataImportHandler to process business documents was added after the release of Solr 1.4, ... Has anyone tried back-porting those changes to Solr 1.4? Did you mean new 1.5 APIs (since TEP was added *after* 1.4 was released)? Even then, that doesn't make a lot of sense to me, as at least a couple of new things (the binary data sources) *were* added to support TikaEntityProcessor. I'm sorry if I'm being dense, but I'm having trouble understanding this answer. Sixten
Re: TikaEntityProcessor on Solr 1.4?
You are right that TikaEntityProcessor has a couple of other prereqs beyond stock Solr 1.4. I think the main point is that they're relatively minor. I've merged TikaEntityProcessor (and some prereqs) and its dependencies into my Solr 1.4 tree and it compiles fine, though I haven't yet tested that TikaEntityProcessor actually works in my setup. Actually, rather than cherry-pick just the changes from SOLR-1358 and SOLR-1583 what I did was to merge in all DataImportHandler-related changes from between the 1.4 release up through Solr trunk r890679 (inclusive). I'm not sure if that's what would work best for you, but it's one option. On Fri, May 21, 2010 at 1:28 PM, Sixten Otto six...@sfko.com wrote: 2010/5/19 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com: I guess it should work because Tika Entityprocessor does not use any new 1.4 APIs On Wed, May 19, 2010 at 1:17 AM, Sixten Otto six...@sfko.com wrote: The TikaEntityProcessor class that enables DataImportHandler to process business documents was added after the release of Solr 1.4, ... Has anyone tried back-porting those changes to Solr 1.4? Did you mean new 1.5 APIs (since TEP was added *after* 1.4 was released)? Even then, that doesn't make a lot of sense to me, as at least a couple of new things (the binary data sources) *were* added to support TikaEntityProcessor. I'm sorry if I'm being dense, but I'm having trouble understanding this answer. Sixten
Re: TikaEntityProcessor on Solr 1.4?
On Fri, May 21, 2010 at 5:30 PM, Chris Harris rygu...@gmail.com wrote: Actually, rather than cherry-pick just the changes from SOLR-1358 and SOLR-1583 what I did was to merge in all DataImportHandler-related changes from between the 1.4 release up through Solr trunk r890679 (inclusive). I'm not sure if that's what would work best for you, but it's one option. I'd rather, of course, not to have to build my own. But if I'm going to dabble in the source at all, it's just a slippery slope from the former to the latter. :-) (My main hesitation in doing so would be that I'm new enough to the code that I have no idea what core changes the trunk's DIH might also depend on. And my Java's pretty rusty.) How did you arrive at your patch? Just grafting the entire trunk/solr/contrib/dataimporthandler onto 1.4's code? Or did you go through Jira/SVN looking for applicable changesets? I'll be very interested to hear how your testing goes! Sixten
Re: TikaEntityProcessor on Solr 1.4?
I guess it should work because Tika Entityprocessor does not use any new 1.4 APIs On Wed, May 19, 2010 at 1:17 AM, Sixten Otto six...@sfko.com wrote: Sorry to repeat this question, but I realized that it probably belonged in its own thread: The TikaEntityProcessor class that enables DataImportHandler to process business documents was added after the release of Solr 1.4, along with some other changes (like the binary DataSources) to support it. Obviously, there hasn't been an official release of Solr since then. Has anyone tried back-porting those changes to Solr 1.4? (I do see that the question was asked last month, without any response: http://www.lucidimagination.com/search/document/5d2d25bc57c370e9) The patches for these issues don't seem all that complex or pervasive, but it's hard for me (as a Solr n00b) to tell whether this is really all that's involved: https://issues.apache.org/jira/browse/SOLR-1583 https://issues.apache.org/jira/browse/SOLR-1358 Sixten -- - Noble Paul | Systems Architect| AOL | http://aol.com
TikaEntityProcessor on Solr 1.4?
Sorry to repeat this question, but I realized that it probably belonged in its own thread: The TikaEntityProcessor class that enables DataImportHandler to process business documents was added after the release of Solr 1.4, along with some other changes (like the binary DataSources) to support it. Obviously, there hasn't been an official release of Solr since then. Has anyone tried back-porting those changes to Solr 1.4? (I do see that the question was asked last month, without any response: http://www.lucidimagination.com/search/document/5d2d25bc57c370e9) The patches for these issues don't seem all that complex or pervasive, but it's hard for me (as a Solr n00b) to tell whether this is really all that's involved: https://issues.apache.org/jira/browse/SOLR-1583 https://issues.apache.org/jira/browse/SOLR-1358 Sixten