I wrote up how to run the Java PP as a OSGi service and a standalone jar. Let me know if you need anything to get it up and running.
https://confluence.sakaiproject.org/display/KERNDOC/Using+the+Java+Preview+Processor Erik On Mon, Aug 13, 2012 at 11:53 AM, Erik Froese <erik.fro...@gmail.com> wrote: > We have a spring planning meeting at rSmart today. I'll put it on the > agenda for my work. > > I'm going on vacation 8/24 - 9/4 though so we'll have to work it out > before then. > > Erik > > On Mon, Aug 13, 2012 at 11:36 AM, Nicolaas Matthijs > <nicolaas.matth...@caret.cam.ac.uk> wrote: >> Hi Erik, >> >> How about trying to turn this work into an official sprint deliverable >> (co-ordinated with Kent)? Do you think you can free up some time for the >> next sprint (August 16 - August 30) and will that give us enough time to do >> remaining implementation, some testing and bug fixing? >> >> Thanks, >> Nicolaas >> >> >> >> >> On 25 Jul 2012, at 00:18, Erik Froese wrote: >> >>> I'm happy to let the Java PP testing slide to 1.5.0 >>> >>> There are some recent improvements in the ruby PP that I need to >>> implement. >>> * sakaidocs - (easy, call out to wkhtmltopdf) >>> * image previews in the same format as the original >>> >>> Erik >>> >>> On Tue, Jul 24, 2012 at 10:18 AM, Kent Fitzgerald <kentf...@umich.edu> >>> wrote: >>>> >>>> Several questions/comments. >>>> There has already been 1.4.1. release proposed for immediately following >>>> 1.4.0 that would be isolated to code reformatting . Which would take >>>> precedence? >>>> >>>> We should definitely do a bug bash. One of the dangers of doing a bug >>>> bash >>>> focused on the preview processor is that we'll likely have people >>>> uploading >>>> hundreds of files each. Subjectively, this could give the impression of >>>> decreased performance just because we're hitting it much harder. >>>> >>>> More importantly, in addition to the bug bash, we need to do controlled >>>> tests on processing time on different data types. I'd like to break it >>>> down >>>> by file types and have truly controlled tests, in addition to different >>>> file >>>> types we'll need files of varying sizes to compare performance not just >>>> on >>>> quantity but on complexity. This needs to be compared to the performance >>>> of >>>> the current implementation. >>>> >>>> I think we all agree that this is an important feature that we shouldn't >>>> try >>>> to rush out the door. >>>> >>>> I have to read back through the thread, but is there set-up >>>> documentation? >>>> Currently we have a section on the OAE Configuration and Deployment page >>>> [1] >>>> for the preview processor. It's contains multiple supporting external >>>> links >>>> that have proven confusing for many people trying to get preview >>>> processor >>>> running locally. We'll need to make sure we have adequate documentation. >>>> >>>> As a side note, I will be out of the office starting this Friday through >>>> next week. >>>> >>>> >>>> [1] >>>> >>>> https://confluence.sakaiproject.org/display/3AK/OAE+Configuration+and+Deployment >>>> >>>> >>>> >>>> -- >>>> Kent Fitzgerald >>>> >>>> On Tuesday, July 24, 2012 at 9:51 AM, Nicolaas Matthijs wrote: >>>> >>>> Looks like this has been hanging around on list for a while now, and we >>>> should probably try to move it forwards. >>>> >>>> The maintainability criterion can only be determined by a code review, >>>> which >>>> is standard practice. However, as this is proving to be such a critical >>>> feature in production, I'd suggest that we do a separate bugbash to >>>> evaluate >>>> its performance, ease of setup (running from a separate machine) and most >>>> importantly functional equivalence. >>>> >>>> When doing this, Kent can give his assessment of the ease of setup and >>>> the >>>> bugbashers can determine functional equivalence. We should also try to >>>> have >>>> it re-process the dummy content we usually bugbash with. >>>> >>>> If this all sounds good, I'd like to go ahead with this as soon as >>>> possible >>>> and run a bugbash straight after the 1.4.0 release with all of this set >>>> up. >>>> If the implementation survives the bugbash, it can be reviewed and >>>> merged. >>>> >>>> Does that sound reasonable? >>>> >>>> Thanks, >>>> Nicolaas >>>> >>>> >>>> >>>> On 23 Jul 2012, at 07:42, Carl Hall wrote: >>>> >>>> Lance, I think the work is already split the way you suggest given what I >>>> know about what Erik has done (rewrite in Java) and what's left (add >>>> JMS). >>>> Adding message queue capabilities should not hold back reviewing the >>>> proposed changes. >>>> >>>> I would say that it needs to meet these opening criteria for my general >>>> acceptance: >>>> >>>> * Be functionally equal with the current solution >>>> * A combination of performance and maintainability >>>> * Perform can be no worse overall. There might be different hotspots in >>>> the java version than the current ruby solution but there shouldn't be >>>> anything exponentially worse. Overall, the java version has to perform at >>>> least as good and hopefully better. Memory usage, overall processing >>>> time, >>>> resource usage (iops, disc reads, caching) should all be considered. >>>> * Be more maintainable than the Ruby solution. The current code has had >>>> very little cleaning and is not very readable. This includes using >>>> externally available libraries where possible. We shouldn't be >>>> maintaining >>>> plumbing not inherent to our domain. >>>> * Easier to setup. Though our current setup for the ruby PP is known to >>>> be >>>> problematic, we at least are accustomed to it. The proposed solution has >>>> got >>>> to be more straightforward and less fragile. >>>> >>>> The numbers I've seen from some preliminary testing showed the Java impl >>>> to >>>> take exponentially *less* time to process pdfs and was faster than the >>>> ruby >>>> PP in every test. It's an OSGi bundle and written in Java like the rest >>>> of >>>> our project which makes it easier to setup and maintain as we write far >>>> more >>>> java code than ruby. I believe there's also already a setup available to >>>> run >>>> the java PP as a standalone server. >>>> The Java version introduces a topia term extractor bundle which is a port >>>> from the Python version. This is a point of maintenance to consider but >>>> the >>>> python code has changed in years. It's a common impl for other languages >>>> to >>>> port but there wasn't a java version around. I would like to see this >>>> code >>>> find a permanent home in a relative OSS project. At the very least it >>>> should >>>> be maintained apart from OAE core to make it available to a broader >>>> audience. >>>> >>>> +1 to getting this code wrapped up and reviewed. >>>> >>>> On Wed, Jul 18, 2012 at 1:51 PM, Christian Vuerings >>>> <vueringschrist...@gmail.com> wrote: >>>> >>>> I'm not sure whether this is already part of the criteria list or not, >>>> but >>>> what about CPU/Memory usage? >>>> Is there a way we can measure that and compare it to the current ruby >>>> based >>>> PP? >>>> When I currently run the ruby PP locally, it's usually one of the >>>> processes >>>> that uses the most resources. >>>> >>>> One other thing I'm curious about is how well it will compress/handle the >>>> different file formats (png/jpg/gif/psd) >>>> >>>> These are just 2 things that I'm interested in since they (can) have an >>>> impact on the overall performance. >>>> >>>> >>>> - Christian >>>> >>>> On Jul 18, 2012, at 12:41 PM, Lance Speelmon wrote: >>>> >>>> Does anyone have an opinion about adopting the new java based PP? >>>> Specifically can you articulate acceptance criteria for such an adoption? >>>> e.g. >>>> >>>> Must support same preview behaviors as existing ruby-based PP. >>>> Must pass QA with all blocker and critical items resolved. >>>> Must start automatically OOTB to support the tire-kicking, web-start >>>> uses. >>>> Must leverage as much 3rd party code as possible to minimize ownership >>>> costs. >>>> Must pass code review. >>>> Unit test code coverage. >>>> Basic config and deployment documentation. >>>> >>>> >>>> What is missing? Anything? Thanks, L >>>> >>>> >>>> >>>> On Jul 17, 2012, at 2:58 PM, Lance Speelmon <la...@rsmart.com> wrote: >>>> >>>> Is there any way to break this work down into chunks? e.g. >>>> >>>> 1. Adopt java PP as default PP moving forward. What are the acceptance >>>> criteria? >>>> 2. Enhance new java PP with message queue abilities. >>>> >>>> WDYT? Thanks, L >>>> >>>> On Jul 17, 2012, at 8:34 AM, Carl Hall <c...@hallwaytech.com> wrote: >>>> >>>> Each app server could run it's own queues but that wouldn't support >>>> building >>>> a farm of PP processors unless we also teach them to talk to multiple JMS >>>> servers. Maybe something like DNS round-robin would suffice? >>>> >>>> On Tue, Jul 17, 2012 at 8:25 AM, Erik Froese <erik.fro...@gmail.com> >>>> wrote: >>>> >>>> Do we need to cluster activemq? Can't each app server service its own >>>> queues? >>>> Erik >>>> >>>> On Tue, Jul 17, 2012 at 11:23 AM, Carl Hall <c...@hallwaytech.com> wrote: >>>>> >>>>> What Erik describes has been on the dev wish list for a little while >>>>> now. >>>>> Moving to an event-driven model would allow us to build out concurrency >>>>> but >>>>> there also comes the question of clustering ActiveMQ. >>>>> >>>>> >>>>> On Thu, Jul 12, 2012 at 6:27 AM, Erik Froese <erik.fro...@gmail.com> >>>>> wrote: >>>>>> >>>>>> >>>>>> Hey David, >>>>>> >>>>>> The code is not clustered. >>>>>> >>>>>> You'd need to write an event listener that would fire when new content >>>>>> is uploaded. It would put the content ids on a JMS queue. Then >>>>>> implement a ContentFetcher that grabs a message off of the queue and >>>>>> wire that into the PPI. Events and Messages are not clustered in OAE >>>>>> (AFAIK) so this would have to be run on each app server. >>>>>> >>>>>> While we're in event-land it'd be nice to be able to regenerate a >>>>>> preview when a content body is updated. I'm not sure if this is >>>>>> possible yet. >>>>>> >>>>>> I'm not sure how we'd limit the CPU usage yet either. You could manage >>>>>> the quartz schedule that runs the PPI. >>>>>> >>>>>> We can also disable concurrent executions of the job. >>>>>> >>>>>> Erik >>>>>> >>>>>> On Wed, Jul 11, 2012 at 8:44 PM, Roma, David <dr...@csu.edu.au> wrote: >>>>>>> >>>>>>> Awesome news Erik! >>>>>>> >>>>>>> Our Ops guys will be stoked when we can get this in.. A couple of >>>>>>> questions from someone who hasn't looked at the code or read too >>>>>>> deeply.... >>>>>>> - Does it support clustering >>>>>>> -e.g. can we just run it side by side on each of our app >>>>>>> servers >>>>>>> and they will play nice sharing out processing jobs? >>>>>>> -will it affect performance of the app servers much? Can we >>>>>>> limit the preview processor to say 10%cpu and 500mb ram or low >>>>>>> priority >>>>>>> threads or limit the number of items to process or something? This >>>>>>> would >>>>>>> make for a nice simple deployment that wouldn't threaten the app >>>>>>> server >>>>>>> stability. >>>>>>> >>>>>>> Cheers, >>>>>>> Dave. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: oae-dev-boun...@collab.sakaiproject.org >>>>>>> [mailto:oae-dev-boun...@collab.sakaiproject.org] On Behalf Of Erik >>>>>>> Froese >>>>>>> Sent: Thursday, 12 July 2012 2:37 AM >>>>>>> To: Carl Hall >>>>>>> Cc: oae-dev@collab.sakaiproject.org; Clay Fenlason >>>>>>> Subject: Re: [oae-dev] Moving the preview processor to java >>>>>>> >>>>>>> Hey everyone, >>>>>>> >>>>>>> Its been a few months but I actually implemented the Java preview >>>>>>> processor as an OSGi bundle. I filed a ticket for it [1] >>>>>>> >>>>>>> I'm not sure where to go from here. Is this something that could be >>>>>>> included POST 1.4.0? >>>>>>> Should I open a PR so we can review the code? If so, PR against which >>>>>>> branch? >>>>>>> >>>>>>> Either way, have a look, give it a go. We'll probably wind up using it >>>>>>> at rSmart. >>>>>>> >>>>>>> Erik >>>>>>> >>>>>>> [1] https://jira.sakaiproject.org/browse/KERN-3021 >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Apr 17, 2012 at 9:09 AM, Carl Hall <c...@hallwaytech.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> I totally agree that we should ally ourselves with other communities. >>>>>>>> I >>>>>>>> see >>>>>>>> where we get docsplit from DocumentCloud[1] and we use several other >>>>>>>> libraries for processing that they've most likely contributed to. >>>>>>>> The Java approach is very little custom code compared to the >>>>>>>> libraries >>>>>>>> we're >>>>>>>> getting from Apache (tika, sanselan, commons, pdfbox), so we would >>>>>>>> still >>>>>>>> building on the shoulders of our friendly community giants. >>>>>>>> >>>>>>>> 1 https://github.com/documentcloud/docsplit >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Apr 14, 2012 at 5:43 AM, John Norman <j...@caret.cam.ac.uk> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> My recollection (perhaps wrong) is that we got this from Document >>>>>>>>> Cloud >>>>>>>>> and I /think/ Chris Roby found it. Document Cloud seems a very >>>>>>>>> relevant and >>>>>>>>> valuable project. If we were able to help them while helping >>>>>>>>> ourselves, >>>>>>>>> other good things could come from the relationship. My general point >>>>>>>>> is that >>>>>>>>> we are thin on resources and so, in principle, symbiotic >>>>>>>>> relationships >>>>>>>>> are >>>>>>>>> helpful. >>>>>>>>> >>>>>>>>> http://www.documentcloud.org/home >>>>>>>>> >>>>>>>>> John >>>>>>>>> >>>>>>>>> Sent from my iPad >>>>>>>>> >>>>>>>>> On 13 Apr 2012, at 17:03, Carl Hall <c...@hallwaytech.com> wrote: >>>>>>>>> >>>>>>>>> I agree with Daniel that our modifications to the preview processor >>>>>>>>> have >>>>>>>>> put its ownership square on us. Was there a community that this >>>>>>>>> script >>>>>>>>> was >>>>>>>>> borrowed from? I thought it was original development that uses >>>>>>>>> various >>>>>>>>> external libraries to do the actual work. This is the approach that >>>>>>>>> Erik is >>>>>>>>> taking with the rewrite using things like Tika (text extraction), >>>>>>>>> Sanselan >>>>>>>>> (images) and a Java port of the python topia.termextract library. >>>>>>>>> >>>>>>>>> I certainly don't deny the speed of development that was realized in >>>>>>>>> creating the PP but the current state of the code is a mess at best. >>>>>>>>> Reuse >>>>>>>>> of libraries in Java is showing a fast rewrite with very little >>>>>>>>> managed code >>>>>>>>> on our part. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Apr 13, 2012 at 12:50 AM, Daniel Parry >>>>>>>>> <dan...@caret.cam.ac.uk> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Apr 12, 2012 at 04:21:36PM -0400, Clay Fenlason wrote: >>>>>>>>>>> >>>>>>>>>>> I think this response is at best orthogonal to the point John's >>>>>>>>>>> trying >>>>>>>>>>> to raise, though I gather this kind of reaction must come from a >>>>>>>>>>> buildup of some real frustration around the PP, which I don't mean >>>>>>>>>>> to >>>>>>>>>>> discount. I also think John was pretty clear about what he was >>>>>>>>>>> suggesting: that there be a conversation with the community we got >>>>>>>>>>> the >>>>>>>>>>> PP from, if the conversation hasn't happened already, to see if >>>>>>>>>>> there >>>>>>>>>>> might still be a way to work together before we decide to just own >>>>>>>>>>> it >>>>>>>>>>> ourselves. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I'd suggest the way that the preview processor was being extended >>>>>>>>>> (initially a >>>>>>>>>> python server add on, followed by a ruby rewrite for tag >>>>>>>>>> extraction) >>>>>>>>>> and >>>>>>>>>> the >>>>>>>>>> variety of ruby versions that deployers were using and the methods >>>>>>>>>> used >>>>>>>>>> to >>>>>>>>>> deploy it were indicative of a) the OAE community already 'owning' >>>>>>>>>> the PP >>>>>>>>>> and b) >>>>>>>>>> as has already been pointed out some standardization needed >>>>>>>>>> restoring >>>>>>>>>> and >>>>>>>>>> additional functionality added for deployers. Hence, the list was >>>>>>>>>> pinged[0] a >>>>>>>>>> while back to ask about standardizing and extending in java. I'm >>>>>>>>>> not >>>>>>>>>> sure >>>>>>>>>> of any >>>>>>>>>> other way to contact the original PP community or if such a >>>>>>>>>> community >>>>>>>>>> separate >>>>>>>>>> to OAE even still exists? >>>>>>>>>> >>>>>>>>>> Best wishes, >>>>>>>>>> >>>>>>>>>> Daniel >>>>>>>>>> >>>>>>>>>> [0] >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> http://collab.sakaiproject.org/pipermail/oae-dev/2012-April/001677.html >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> --| Daniel Parry: dan...@caret.cam.ac.uk. www.caret.cam.ac.uk/ |-- >>>>>>>>>> "Of all the things a leader should fear, complacency should >>>>>>>>>> head the list." [John C. Maxwell] >>>>>>>>>> _______________________________________________ >>>>>>>>>> oae-dev mailing list >>>>>>>>>> oae-dev@collab.sakaiproject.org >>>>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> oae-dev mailing list >>>>>>>>> oae-dev@collab.sakaiproject.org >>>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> oae-dev mailing list >>>>>>>> oae-dev@collab.sakaiproject.org >>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> oae-dev mailing list >>>>>>> oae-dev@collab.sakaiproject.org >>>>>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >>>>>>> Charles Sturt University >>>>>>> >>>>>>> | ALBURY-WODONGA | BATHURST | CANBERRA | DUBBO | GOULBURN | MELBOURNE >>>>>>> | >>>>>>> ONTARIO | ORANGE | PORT MACQUARIE | SYDNEY | WAGGA WAGGA | >>>>>>> >>>>>>> LEGAL NOTICE >>>>>>> This email (and any attachment) is confidential and is intended for >>>>>>> the >>>>>>> use of the addressee(s) only. If you are not the intended recipient of >>>>>>> this >>>>>>> email, you must not copy, distribute, take any action in reliance on >>>>>>> it >>>>>>> or >>>>>>> disclose it to anyone. Any confidentiality is not waived or lost by >>>>>>> reason >>>>>>> of mistaken delivery. Email should be checked for viruses and defects >>>>>>> before >>>>>>> opening. Charles Sturt University (CSU) does not accept liability for >>>>>>> viruses or any consequence which arise as a result of this email >>>>>>> transmission. Email communications with CSU may be subject to >>>>>>> automated >>>>>>> email filtering, which could result in the delay or deletion of a >>>>>>> legitimate >>>>>>> email before it is read at CSU. The views expressed in this email are >>>>>>> not >>>>>>> necessarily those of CSU. >>>>>>> >>>>>>> Charles Sturt University in Australia http://www.csu.edu.au The >>>>>>> Chancellery, Panorama Avenue, Bathurst NSW Australia 2795 ABN: 83 878 >>>>>>> 708 >>>>>>> 551; CRICOS Provider Numbers: 00005F (NSW), 01947G (VIC), 02960B (ACT) >>>>>>> >>>>>>> Charles Sturt University in Ontario http://www.charlessturt.ca 860 >>>>>>> Harrington Court, Burlington Ontario Canada L7N 3N4 Registration: >>>>>>> www.peqab.ca >>>>>>> >>>>>>> Consider the environment before printing this email. >>>>> >>>>> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> oae-dev mailing list >>>> oae-dev@collab.sakaiproject.org >>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >>>> >>>> >>>> >>>> _______________________________________________ >>>> oae-dev mailing list >>>> oae-dev@collab.sakaiproject.org >>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >>>> >>>> >>>> >>>> _______________________________________________ >>>> oae-dev mailing list >>>> oae-dev@collab.sakaiproject.org >>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >>>> >>>> >>>> _______________________________________________ >>>> oae-dev mailing list >>>> oae-dev@collab.sakaiproject.org >>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >>>> >>>> >>>> _______________________________________________ >>>> oae-dev mailing list >>>> oae-dev@collab.sakaiproject.org >>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >>>> >>>> >>>> >>>> _______________________________________________ >>>> oae-dev mailing list >>>> oae-dev@collab.sakaiproject.org >>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >>>> >> _______________________________________________ oae-dev mailing list oae-dev@collab.sakaiproject.org http://collab.sakaiproject.org/mailman/listinfo/oae-dev