Hi Erik, How about trying to turn this work into an official sprint deliverable (co-ordinated with Kent)? Do you think you can free up some time for the next sprint (August 16 - August 30) and will that give us enough time to do remaining implementation, some testing and bug fixing?
Thanks, Nicolaas On 25 Jul 2012, at 00:18, Erik Froese wrote: > I'm happy to let the Java PP testing slide to 1.5.0 > > There are some recent improvements in the ruby PP that I need to > implement. > * sakaidocs - (easy, call out to wkhtmltopdf) > * image previews in the same format as the original > > Erik > > On Tue, Jul 24, 2012 at 10:18 AM, Kent Fitzgerald > <kentf...@umich.edu> wrote: >> Several questions/comments. >> There has already been 1.4.1. release proposed for immediately >> following >> 1.4.0 that would be isolated to code reformatting . Which would take >> precedence? >> >> We should definitely do a bug bash. One of the dangers of doing a >> bug bash >> focused on the preview processor is that we'll likely have people >> uploading >> hundreds of files each. Subjectively, this could give the >> impression of >> decreased performance just because we're hitting it much harder. >> >> More importantly, in addition to the bug bash, we need to do >> controlled >> tests on processing time on different data types. I'd like to break >> it down >> by file types and have truly controlled tests, in addition to >> different file >> types we'll need files of varying sizes to compare performance not >> just on >> quantity but on complexity. This needs to be compared to the >> performance of >> the current implementation. >> >> I think we all agree that this is an important feature that we >> shouldn't try >> to rush out the door. >> >> I have to read back through the thread, but is there set-up >> documentation? >> Currently we have a section on the OAE Configuration and Deployment >> page [1] >> for the preview processor. It's contains multiple supporting >> external links >> that have proven confusing for many people trying to get preview >> processor >> running locally. We'll need to make sure we have adequate >> documentation. >> >> As a side note, I will be out of the office starting this Friday >> through >> next week. >> >> >> [1] >> https://confluence.sakaiproject.org/display/3AK/OAE+Configuration+and+Deployment >> >> >> >> -- >> Kent Fitzgerald >> >> On Tuesday, July 24, 2012 at 9:51 AM, Nicolaas Matthijs wrote: >> >> Looks like this has been hanging around on list for a while now, >> and we >> should probably try to move it forwards. >> >> The maintainability criterion can only be determined by a code >> review, which >> is standard practice. However, as this is proving to be such a >> critical >> feature in production, I'd suggest that we do a separate bugbash to >> evaluate >> its performance, ease of setup (running from a separate machine) >> and most >> importantly functional equivalence. >> >> When doing this, Kent can give his assessment of the ease of setup >> and the >> bugbashers can determine functional equivalence. We should also try >> to have >> it re-process the dummy content we usually bugbash with. >> >> If this all sounds good, I'd like to go ahead with this as soon as >> possible >> and run a bugbash straight after the 1.4.0 release with all of this >> set up. >> If the implementation survives the bugbash, it can be reviewed and >> merged. >> >> Does that sound reasonable? >> >> Thanks, >> Nicolaas >> >> >> >> On 23 Jul 2012, at 07:42, Carl Hall wrote: >> >> Lance, I think the work is already split the way you suggest given >> what I >> know about what Erik has done (rewrite in Java) and what's left >> (add JMS). >> Adding message queue capabilities should not hold back reviewing the >> proposed changes. >> >> I would say that it needs to meet these opening criteria for my >> general >> acceptance: >> >> * Be functionally equal with the current solution >> * A combination of performance and maintainability >> * Perform can be no worse overall. There might be different >> hotspots in >> the java version than the current ruby solution but there shouldn't >> be >> anything exponentially worse. Overall, the java version has to >> perform at >> least as good and hopefully better. Memory usage, overall >> processing time, >> resource usage (iops, disc reads, caching) should all be considered. >> * Be more maintainable than the Ruby solution. The current code >> has had >> very little cleaning and is not very readable. This includes using >> externally available libraries where possible. We shouldn't be >> maintaining >> plumbing not inherent to our domain. >> * Easier to setup. Though our current setup for the ruby PP is >> known to be >> problematic, we at least are accustomed to it. The proposed >> solution has got >> to be more straightforward and less fragile. >> >> The numbers I've seen from some preliminary testing showed the Java >> impl to >> take exponentially *less* time to process pdfs and was faster than >> the ruby >> PP in every test. It's an OSGi bundle and written in Java like the >> rest of >> our project which makes it easier to setup and maintain as we write >> far more >> java code than ruby. I believe there's also already a setup >> available to run >> the java PP as a standalone server. >> The Java version introduces a topia term extractor bundle which is >> a port >> from the Python version. This is a point of maintenance to consider >> but the >> python code has changed in years. It's a common impl for other >> languages to >> port but there wasn't a java version around. I would like to see >> this code >> find a permanent home in a relative OSS project. At the very least >> it should >> be maintained apart from OAE core to make it available to a broader >> audience. >> >> +1 to getting this code wrapped up and reviewed. >> >> On Wed, Jul 18, 2012 at 1:51 PM, Christian Vuerings >> <vueringschrist...@gmail.com> wrote: >> >> I'm not sure whether this is already part of the criteria list or >> not, but >> what about CPU/Memory usage? >> Is there a way we can measure that and compare it to the current >> ruby based >> PP? >> When I currently run the ruby PP locally, it's usually one of the >> processes >> that uses the most resources. >> >> One other thing I'm curious about is how well it will compress/ >> handle the >> different file formats (png/jpg/gif/psd) >> >> These are just 2 things that I'm interested in since they (can) >> have an >> impact on the overall performance. >> >> >> - Christian >> >> On Jul 18, 2012, at 12:41 PM, Lance Speelmon wrote: >> >> Does anyone have an opinion about adopting the new java based PP? >> Specifically can you articulate acceptance criteria for such an >> adoption? >> e.g. >> >> Must support same preview behaviors as existing ruby-based PP. >> Must pass QA with all blocker and critical items resolved. >> Must start automatically OOTB to support the tire-kicking, web- >> start uses. >> Must leverage as much 3rd party code as possible to minimize >> ownership >> costs. >> Must pass code review. >> Unit test code coverage. >> Basic config and deployment documentation. >> >> >> What is missing? Anything? Thanks, L >> >> >> >> On Jul 17, 2012, at 2:58 PM, Lance Speelmon <la...@rsmart.com> wrote: >> >> Is there any way to break this work down into chunks? e.g. >> >> 1. Adopt java PP as default PP moving forward. What are the >> acceptance >> criteria? >> 2. Enhance new java PP with message queue abilities. >> >> WDYT? Thanks, L >> >> On Jul 17, 2012, at 8:34 AM, Carl Hall <c...@hallwaytech.com> wrote: >> >> Each app server could run it's own queues but that wouldn't support >> building >> a farm of PP processors unless we also teach them to talk to >> multiple JMS >> servers. Maybe something like DNS round-robin would suffice? >> >> On Tue, Jul 17, 2012 at 8:25 AM, Erik Froese >> <erik.fro...@gmail.com> wrote: >> >> Do we need to cluster activemq? Can't each app server service its own >> queues? >> Erik >> >> On Tue, Jul 17, 2012 at 11:23 AM, Carl Hall <c...@hallwaytech.com> >> wrote: >>> What Erik describes has been on the dev wish list for a little >>> while now. >>> Moving to an event-driven model would allow us to build out >>> concurrency >>> but >>> there also comes the question of clustering ActiveMQ. >>> >>> >>> On Thu, Jul 12, 2012 at 6:27 AM, Erik Froese <erik.fro...@gmail.com> >>> wrote: >>>> >>>> Hey David, >>>> >>>> The code is not clustered. >>>> >>>> You'd need to write an event listener that would fire when new >>>> content >>>> is uploaded. It would put the content ids on a JMS queue. Then >>>> implement a ContentFetcher that grabs a message off of the queue >>>> and >>>> wire that into the PPI. Events and Messages are not clustered in >>>> OAE >>>> (AFAIK) so this would have to be run on each app server. >>>> >>>> While we're in event-land it'd be nice to be able to regenerate a >>>> preview when a content body is updated. I'm not sure if this is >>>> possible yet. >>>> >>>> I'm not sure how we'd limit the CPU usage yet either. You could >>>> manage >>>> the quartz schedule that runs the PPI. >>>> >>>> We can also disable concurrent executions of the job. >>>> >>>> Erik >>>> >>>> On Wed, Jul 11, 2012 at 8:44 PM, Roma, David <dr...@csu.edu.au> >>>> wrote: >>>>> Awesome news Erik! >>>>> >>>>> Our Ops guys will be stoked when we can get this in.. A couple of >>>>> questions from someone who hasn't looked at the code or read too >>>>> deeply.... >>>>> - Does it support clustering >>>>> -e.g. can we just run it side by side on each of our app >>>>> servers >>>>> and they will play nice sharing out processing jobs? >>>>> -will it affect performance of the app servers much? Can we >>>>> limit the preview processor to say 10%cpu and 500mb ram or low >>>>> priority >>>>> threads or limit the number of items to process or something? This >>>>> would >>>>> make for a nice simple deployment that wouldn't threaten the app >>>>> server >>>>> stability. >>>>> >>>>> Cheers, >>>>> Dave. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: oae-dev-boun...@collab.sakaiproject.org >>>>> [mailto:oae-dev-boun...@collab.sakaiproject.org] On Behalf Of Erik >>>>> Froese >>>>> Sent: Thursday, 12 July 2012 2:37 AM >>>>> To: Carl Hall >>>>> Cc: oae-dev@collab.sakaiproject.org; Clay Fenlason >>>>> Subject: Re: [oae-dev] Moving the preview processor to java >>>>> >>>>> Hey everyone, >>>>> >>>>> Its been a few months but I actually implemented the Java preview >>>>> processor as an OSGi bundle. I filed a ticket for it [1] >>>>> >>>>> I'm not sure where to go from here. Is this something that could >>>>> be >>>>> included POST 1.4.0? >>>>> Should I open a PR so we can review the code? If so, PR against >>>>> which >>>>> branch? >>>>> >>>>> Either way, have a look, give it a go. We'll probably wind up >>>>> using it >>>>> at rSmart. >>>>> >>>>> Erik >>>>> >>>>> [1] https://jira.sakaiproject.org/browse/KERN-3021 >>>>> >>>>> >>>>> >>>>> On Tue, Apr 17, 2012 at 9:09 AM, Carl Hall <c...@hallwaytech.com> >>>>> wrote: >>>>>> I totally agree that we should ally ourselves with other >>>>>> communities. >>>>>> I >>>>>> see >>>>>> where we get docsplit from DocumentCloud[1] and we use several >>>>>> other >>>>>> libraries for processing that they've most likely contributed to. >>>>>> The Java approach is very little custom code compared to the >>>>>> libraries >>>>>> we're >>>>>> getting from Apache (tika, sanselan, commons, pdfbox), so we >>>>>> would >>>>>> still >>>>>> building on the shoulders of our friendly community giants. >>>>>> >>>>>> 1 https://github.com/documentcloud/docsplit >>>>>> >>>>>> >>>>>> >>>>>> On Sat, Apr 14, 2012 at 5:43 AM, John Norman <j...@caret.cam.ac.uk >>>>>> > >>>>>> wrote: >>>>>>> >>>>>>> My recollection (perhaps wrong) is that we got this from >>>>>>> Document >>>>>>> Cloud >>>>>>> and I /think/ Chris Roby found it. Document Cloud seems a very >>>>>>> relevant and >>>>>>> valuable project. If we were able to help them while helping >>>>>>> ourselves, >>>>>>> other good things could come from the relationship. My general >>>>>>> point >>>>>>> is that >>>>>>> we are thin on resources and so, in principle, symbiotic >>>>>>> relationships >>>>>>> are >>>>>>> helpful. >>>>>>> >>>>>>> http://www.documentcloud.org/home >>>>>>> >>>>>>> John >>>>>>> >>>>>>> Sent from my iPad >>>>>>> >>>>>>> On 13 Apr 2012, at 17:03, Carl Hall <c...@hallwaytech.com> >>>>>>> wrote: >>>>>>> >>>>>>> I agree with Daniel that our modifications to the preview >>>>>>> processor >>>>>>> have >>>>>>> put its ownership square on us. Was there a community that this >>>>>>> script >>>>>>> was >>>>>>> borrowed from? I thought it was original development that uses >>>>>>> various >>>>>>> external libraries to do the actual work. This is the approach >>>>>>> that >>>>>>> Erik is >>>>>>> taking with the rewrite using things like Tika (text >>>>>>> extraction), >>>>>>> Sanselan >>>>>>> (images) and a Java port of the python topia.termextract >>>>>>> library. >>>>>>> >>>>>>> I certainly don't deny the speed of development that was >>>>>>> realized in >>>>>>> creating the PP but the current state of the code is a mess at >>>>>>> best. >>>>>>> Reuse >>>>>>> of libraries in Java is showing a fast rewrite with very little >>>>>>> managed code >>>>>>> on our part. >>>>>>> >>>>>>> >>>>>>> On Fri, Apr 13, 2012 at 12:50 AM, Daniel Parry >>>>>>> <dan...@caret.cam.ac.uk> >>>>>>> wrote: >>>>>>>> >>>>>>>> On Thu, Apr 12, 2012 at 04:21:36PM -0400, Clay Fenlason wrote: >>>>>>>>> I think this response is at best orthogonal to the point >>>>>>>>> John's >>>>>>>>> trying >>>>>>>>> to raise, though I gather this kind of reaction must come >>>>>>>>> from a >>>>>>>>> buildup of some real frustration around the PP, which I >>>>>>>>> don't mean >>>>>>>>> to >>>>>>>>> discount. I also think John was pretty clear about what he was >>>>>>>>> suggesting: that there be a conversation with the community >>>>>>>>> we got >>>>>>>>> the >>>>>>>>> PP from, if the conversation hasn't happened already, to see >>>>>>>>> if >>>>>>>>> there >>>>>>>>> might still be a way to work together before we decide to >>>>>>>>> just own >>>>>>>>> it >>>>>>>>> ourselves. >>>>>>>> >>>>>>>> I'd suggest the way that the preview processor was being >>>>>>>> extended >>>>>>>> (initially a >>>>>>>> python server add on, followed by a ruby rewrite for tag >>>>>>>> extraction) >>>>>>>> and >>>>>>>> the >>>>>>>> variety of ruby versions that deployers were using and the >>>>>>>> methods >>>>>>>> used >>>>>>>> to >>>>>>>> deploy it were indicative of a) the OAE community already >>>>>>>> 'owning' >>>>>>>> the PP >>>>>>>> and b) >>>>>>>> as has already been pointed out some standardization needed >>>>>>>> restoring >>>>>>>> and >>>>>>>> additional functionality added for deployers. Hence, the >>>>>>>> list was >>>>>>>> pinged[0] a >>>>>>>> while back to ask about standardizing and extending in java. >>>>>>>> I'm not >>>>>>>> sure >>>>>>>> of any >>>>>>>> other way to contact the original PP community or if such a >>>>>>>> community >>>>>>>> separate >>>>>>>> to OAE even still exists? >>>>>>>> >>>>>>>> Best wishes, >>>>>>>> >>>>>>>> Daniel >>>>>>>> >>>>>>>> [0] >>>>>>>> >>>>>>>> >>>>>>>> http://collab.sakaiproject.org/pipermail/oae-dev/2012-April/001677.html >>>>>>>> >>>>>>>> -- >>>>>>>> --| Daniel Parry: dan...@caret.cam.ac.uk. >>>>>>>> www.caret.cam.ac.uk/ |-- >>>>>>>> "Of all the things a leader should fear, complacency should >>>>>>>> head the list." [John C. Maxwell] >>>>>>>> _______________________________________________ >>>>>>>> oae-dev mailing list >>>>>>>> oae-dev@collab.sakaiproject.org >>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> oae-dev mailing list >>>>>>> oae-dev@collab.sakaiproject.org >>>>>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> oae-dev mailing list >>>>>> oae-dev@collab.sakaiproject.org >>>>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >>>>>> >>>>> _______________________________________________ >>>>> oae-dev mailing list >>>>> oae-dev@collab.sakaiproject.org >>>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >>>>> Charles Sturt University >>>>> >>>>> | ALBURY-WODONGA | BATHURST | CANBERRA | DUBBO | GOULBURN | >>>>> MELBOURNE | >>>>> ONTARIO | ORANGE | PORT MACQUARIE | SYDNEY | WAGGA WAGGA | >>>>> >>>>> LEGAL NOTICE >>>>> This email (and any attachment) is confidential and is intended >>>>> for the >>>>> use of the addressee(s) only. If you are not the intended >>>>> recipient of >>>>> this >>>>> email, you must not copy, distribute, take any action in >>>>> reliance on it >>>>> or >>>>> disclose it to anyone. Any confidentiality is not waived or lost >>>>> by >>>>> reason >>>>> of mistaken delivery. Email should be checked for viruses and >>>>> defects >>>>> before >>>>> opening. Charles Sturt University (CSU) does not accept >>>>> liability for >>>>> viruses or any consequence which arise as a result of this email >>>>> transmission. Email communications with CSU may be subject to >>>>> automated >>>>> email filtering, which could result in the delay or deletion of a >>>>> legitimate >>>>> email before it is read at CSU. The views expressed in this >>>>> email are >>>>> not >>>>> necessarily those of CSU. >>>>> >>>>> Charles Sturt University in Australia http://www.csu.edu.au The >>>>> Chancellery, Panorama Avenue, Bathurst NSW Australia 2795 ABN: >>>>> 83 878 >>>>> 708 >>>>> 551; CRICOS Provider Numbers: 00005F (NSW), 01947G (VIC), 02960B >>>>> (ACT) >>>>> >>>>> Charles Sturt University in Ontario http://www.charlessturt.ca >>>>> 860 >>>>> Harrington Court, Burlington Ontario Canada L7N 3N4 Registration: >>>>> www.peqab.ca >>>>> >>>>> Consider the environment before printing this email. >>> >>> >> >> >> _______________________________________________ >> oae-dev mailing list >> oae-dev@collab.sakaiproject.org >> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >> >> >> >> _______________________________________________ >> oae-dev mailing list >> oae-dev@collab.sakaiproject.org >> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >> >> >> >> _______________________________________________ >> oae-dev mailing list >> oae-dev@collab.sakaiproject.org >> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >> >> >> _______________________________________________ >> oae-dev mailing list >> oae-dev@collab.sakaiproject.org >> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >> >> >> _______________________________________________ >> oae-dev mailing list >> oae-dev@collab.sakaiproject.org >> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >> >> >> >> _______________________________________________ >> oae-dev mailing list >> oae-dev@collab.sakaiproject.org >> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >> _______________________________________________ oae-dev mailing list oae-dev@collab.sakaiproject.org http://collab.sakaiproject.org/mailman/listinfo/oae-dev