That sounds awesome Nico. Thanks
On Tue, Jul 24, 2012 at 9:51 AM, Nicolaas Matthijs <nicolaas.matth...@caret.cam.ac.uk> wrote: > Looks like this has been hanging around on list for a while now, and we > should probably try to move it forwards. > > The maintainability criterion can only be determined by a code review, which > is standard practice. However, as this is proving to be such a critical > feature in production, I'd suggest that we do a separate bugbash to evaluate > its performance, ease of setup (running from a separate machine) and most > importantly functional equivalence. > > When doing this, Kent can give his assessment of the ease of setup and the > bugbashers can determine functional equivalence. We should also try to have > it re-process the dummy content we usually bugbash with. > > If this all sounds good, I'd like to go ahead with this as soon as possible > and run a bugbash straight after the 1.4.0 release with all of this set up. > If the implementation survives the bugbash, it can be reviewed and merged. > > Does that sound reasonable? > > Thanks, > Nicolaas > > > > On 23 Jul 2012, at 07:42, Carl Hall wrote: > > Lance, I think the work is already split the way you suggest given what I > know about what Erik has done (rewrite in Java) and what's left (add JMS). > Adding message queue capabilities should not hold back reviewing the > proposed changes. > > I would say that it needs to meet these opening criteria for my general > acceptance: > > * Be functionally equal with the current solution > * A combination of performance and maintainability > * Perform can be no worse overall. There might be different hotspots in > the java version than the current ruby solution but there shouldn't be > anything exponentially worse. Overall, the java version has to perform at > least as good and hopefully better. Memory usage, overall processing time, > resource usage (iops, disc reads, caching) should all be considered. > * Be more maintainable than the Ruby solution. The current code has had > very little cleaning and is not very readable. This includes using > externally available libraries where possible. We shouldn't be maintaining > plumbing not inherent to our domain. > * Easier to setup. Though our current setup for the ruby PP is known to be > problematic, we at least are accustomed to it. The proposed solution has got > to be more straightforward and less fragile. > > The numbers I've seen from some preliminary testing showed the Java impl to > take exponentially *less* time to process pdfs and was faster than the ruby > PP in every test. It's an OSGi bundle and written in Java like the rest of > our project which makes it easier to setup and maintain as we write far more > java code than ruby. I believe there's also already a setup available to run > the java PP as a standalone server. > The Java version introduces a topia term extractor bundle which is a port > from the Python version. This is a point of maintenance to consider but the > python code has changed in years. It's a common impl for other languages to > port but there wasn't a java version around. I would like to see this code > find a permanent home in a relative OSS project. At the very least it should > be maintained apart from OAE core to make it available to a broader > audience. > > +1 to getting this code wrapped up and reviewed. > > On Wed, Jul 18, 2012 at 1:51 PM, Christian Vuerings > <vueringschrist...@gmail.com> wrote: >> >> I'm not sure whether this is already part of the criteria list or not, but >> what about CPU/Memory usage? >> Is there a way we can measure that and compare it to the current ruby >> based PP? >> When I currently run the ruby PP locally, it's usually one of the >> processes that uses the most resources. >> >> One other thing I'm curious about is how well it will compress/handle the >> different file formats (png/jpg/gif/psd) >> >> These are just 2 things that I'm interested in since they (can) have an >> impact on the overall performance. >> >> >> - Christian >> >> On Jul 18, 2012, at 12:41 PM, Lance Speelmon wrote: >> >> Does anyone have an opinion about adopting the new java based PP? >> Specifically can you articulate acceptance criteria for such an adoption? >> e.g. >> >> Must support same preview behaviors as existing ruby-based PP. >> Must pass QA with all blocker and critical items resolved. >> Must start automatically OOTB to support the tire-kicking, web-start uses. >> Must leverage as much 3rd party code as possible to minimize ownership >> costs. >> Must pass code review. >> Unit test code coverage. >> Basic config and deployment documentation. >> >> >> What is missing? Anything? Thanks, L >> >> >> >> On Jul 17, 2012, at 2:58 PM, Lance Speelmon <la...@rsmart.com> wrote: >> >> Is there any way to break this work down into chunks? e.g. >> >> 1. Adopt java PP as default PP moving forward. What are the acceptance >> criteria? >> 2. Enhance new java PP with message queue abilities. >> >> WDYT? Thanks, L >> >> On Jul 17, 2012, at 8:34 AM, Carl Hall <c...@hallwaytech.com> wrote: >> >> Each app server could run it's own queues but that wouldn't support >> building a farm of PP processors unless we also teach them to talk to >> multiple JMS servers. Maybe something like DNS round-robin would suffice? >> >> On Tue, Jul 17, 2012 at 8:25 AM, Erik Froese <erik.fro...@gmail.com> >> wrote: >>> >>> Do we need to cluster activemq? Can't each app server service its own >>> queues? >>> Erik >>> >>> On Tue, Jul 17, 2012 at 11:23 AM, Carl Hall <c...@hallwaytech.com> wrote: >>> > What Erik describes has been on the dev wish list for a little while >>> > now. >>> > Moving to an event-driven model would allow us to build out concurrency >>> > but >>> > there also comes the question of clustering ActiveMQ. >>> > >>> > >>> > On Thu, Jul 12, 2012 at 6:27 AM, Erik Froese <erik.fro...@gmail.com> >>> > wrote: >>> >> >>> >> Hey David, >>> >> >>> >> The code is not clustered. >>> >> >>> >> You'd need to write an event listener that would fire when new content >>> >> is uploaded. It would put the content ids on a JMS queue. Then >>> >> implement a ContentFetcher that grabs a message off of the queue and >>> >> wire that into the PPI. Events and Messages are not clustered in OAE >>> >> (AFAIK) so this would have to be run on each app server. >>> >> >>> >> While we're in event-land it'd be nice to be able to regenerate a >>> >> preview when a content body is updated. I'm not sure if this is >>> >> possible yet. >>> >> >>> >> I'm not sure how we'd limit the CPU usage yet either. You could manage >>> >> the quartz schedule that runs the PPI. >>> >> >>> >> We can also disable concurrent executions of the job. >>> >> >>> >> Erik >>> >> >>> >> On Wed, Jul 11, 2012 at 8:44 PM, Roma, David <dr...@csu.edu.au> wrote: >>> >> > Awesome news Erik! >>> >> > >>> >> > Our Ops guys will be stoked when we can get this in.. A couple of >>> >> > questions from someone who hasn't looked at the code or read too >>> >> > deeply.... >>> >> > - Does it support clustering >>> >> > -e.g. can we just run it side by side on each of our app >>> >> > servers >>> >> > and they will play nice sharing out processing jobs? >>> >> > -will it affect performance of the app servers much? Can we >>> >> > limit the preview processor to say 10%cpu and 500mb ram or low >>> >> > priority >>> >> > threads or limit the number of items to process or something? This >>> >> > would >>> >> > make for a nice simple deployment that wouldn't threaten the app >>> >> > server >>> >> > stability. >>> >> > >>> >> > Cheers, >>> >> > Dave. >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > -----Original Message----- >>> >> > From: oae-dev-boun...@collab.sakaiproject.org >>> >> > [mailto:oae-dev-boun...@collab.sakaiproject.org] On Behalf Of Erik >>> >> > Froese >>> >> > Sent: Thursday, 12 July 2012 2:37 AM >>> >> > To: Carl Hall >>> >> > Cc: oae-dev@collab.sakaiproject.org; Clay Fenlason >>> >> > Subject: Re: [oae-dev] Moving the preview processor to java >>> >> > >>> >> > Hey everyone, >>> >> > >>> >> > Its been a few months but I actually implemented the Java preview >>> >> > processor as an OSGi bundle. I filed a ticket for it [1] >>> >> > >>> >> > I'm not sure where to go from here. Is this something that could be >>> >> > included POST 1.4.0? >>> >> > Should I open a PR so we can review the code? If so, PR against >>> >> > which >>> >> > branch? >>> >> > >>> >> > Either way, have a look, give it a go. We'll probably wind up using >>> >> > it >>> >> > at rSmart. >>> >> > >>> >> > Erik >>> >> > >>> >> > [1] https://jira.sakaiproject.org/browse/KERN-3021 >>> >> > >>> >> > >>> >> > >>> >> > On Tue, Apr 17, 2012 at 9:09 AM, Carl Hall <c...@hallwaytech.com> >>> >> > wrote: >>> >> >> I totally agree that we should ally ourselves with other >>> >> >> communities. I >>> >> >> see >>> >> >> where we get docsplit from DocumentCloud[1] and we use several >>> >> >> other >>> >> >> libraries for processing that they've most likely contributed to. >>> >> >> The Java approach is very little custom code compared to the >>> >> >> libraries >>> >> >> we're >>> >> >> getting from Apache (tika, sanselan, commons, pdfbox), so we would >>> >> >> still >>> >> >> building on the shoulders of our friendly community giants. >>> >> >> >>> >> >> 1 https://github.com/documentcloud/docsplit >>> >> >> >>> >> >> >>> >> >> >>> >> >> On Sat, Apr 14, 2012 at 5:43 AM, John Norman <j...@caret.cam.ac.uk> >>> >> >> wrote: >>> >> >>> >>> >> >>> My recollection (perhaps wrong) is that we got this from Document >>> >> >>> Cloud >>> >> >>> and I /think/ Chris Roby found it. Document Cloud seems a very >>> >> >>> relevant and >>> >> >>> valuable project. If we were able to help them while helping >>> >> >>> ourselves, >>> >> >>> other good things could come from the relationship. My general >>> >> >>> point >>> >> >>> is that >>> >> >>> we are thin on resources and so, in principle, symbiotic >>> >> >>> relationships >>> >> >>> are >>> >> >>> helpful. >>> >> >>> >>> >> >>> http://www.documentcloud.org/home >>> >> >>> >>> >> >>> John >>> >> >>> >>> >> >>> Sent from my iPad >>> >> >>> >>> >> >>> On 13 Apr 2012, at 17:03, Carl Hall <c...@hallwaytech.com> wrote: >>> >> >>> >>> >> >>> I agree with Daniel that our modifications to the preview >>> >> >>> processor >>> >> >>> have >>> >> >>> put its ownership square on us. Was there a community that this >>> >> >>> script >>> >> >>> was >>> >> >>> borrowed from? I thought it was original development that uses >>> >> >>> various >>> >> >>> external libraries to do the actual work. This is the approach >>> >> >>> that >>> >> >>> Erik is >>> >> >>> taking with the rewrite using things like Tika (text extraction), >>> >> >>> Sanselan >>> >> >>> (images) and a Java port of the python topia.termextract library. >>> >> >>> >>> >> >>> I certainly don't deny the speed of development that was realized >>> >> >>> in >>> >> >>> creating the PP but the current state of the code is a mess at >>> >> >>> best. >>> >> >>> Reuse >>> >> >>> of libraries in Java is showing a fast rewrite with very little >>> >> >>> managed code >>> >> >>> on our part. >>> >> >>> >>> >> >>> >>> >> >>> On Fri, Apr 13, 2012 at 12:50 AM, Daniel Parry >>> >> >>> <dan...@caret.cam.ac.uk> >>> >> >>> wrote: >>> >> >>>> >>> >> >>>> On Thu, Apr 12, 2012 at 04:21:36PM -0400, Clay Fenlason wrote: >>> >> >>>> > I think this response is at best orthogonal to the point John's >>> >> >>>> > trying >>> >> >>>> > to raise, though I gather this kind of reaction must come from >>> >> >>>> > a >>> >> >>>> > buildup of some real frustration around the PP, which I don't >>> >> >>>> > mean >>> >> >>>> > to >>> >> >>>> > discount. I also think John was pretty clear about what he was >>> >> >>>> > suggesting: that there be a conversation with the community we >>> >> >>>> > got >>> >> >>>> > the >>> >> >>>> > PP from, if the conversation hasn't happened already, to see if >>> >> >>>> > there >>> >> >>>> > might still be a way to work together before we decide to just >>> >> >>>> > own >>> >> >>>> > it >>> >> >>>> > ourselves. >>> >> >>>> >>> >> >>>> I'd suggest the way that the preview processor was being extended >>> >> >>>> (initially a >>> >> >>>> python server add on, followed by a ruby rewrite for tag >>> >> >>>> extraction) >>> >> >>>> and >>> >> >>>> the >>> >> >>>> variety of ruby versions that deployers were using and the >>> >> >>>> methods >>> >> >>>> used >>> >> >>>> to >>> >> >>>> deploy it were indicative of a) the OAE community already >>> >> >>>> 'owning' >>> >> >>>> the PP >>> >> >>>> and b) >>> >> >>>> as has already been pointed out some standardization needed >>> >> >>>> restoring >>> >> >>>> and >>> >> >>>> additional functionality added for deployers. Hence, the list >>> >> >>>> was >>> >> >>>> pinged[0] a >>> >> >>>> while back to ask about standardizing and extending in java. I'm >>> >> >>>> not >>> >> >>>> sure >>> >> >>>> of any >>> >> >>>> other way to contact the original PP community or if such a >>> >> >>>> community >>> >> >>>> separate >>> >> >>>> to OAE even still exists? >>> >> >>>> >>> >> >>>> Best wishes, >>> >> >>>> >>> >> >>>> Daniel >>> >> >>>> >>> >> >>>> [0] >>> >> >>>> >>> >> >>>> >>> >> >>>> http://collab.sakaiproject.org/pipermail/oae-dev/2012-April/001677.html >>> >> >>>> >>> >> >>>> -- >>> >> >>>> --| Daniel Parry: dan...@caret.cam.ac.uk. www.caret.cam.ac.uk/ >>> >> >>>> |-- >>> >> >>>> "Of all the things a leader should fear, complacency should >>> >> >>>> head the list." [John C. Maxwell] >>> >> >>>> _______________________________________________ >>> >> >>>> oae-dev mailing list >>> >> >>>> oae-dev@collab.sakaiproject.org >>> >> >>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >>> >> >>> >>> >> >>> >>> >> >>> _______________________________________________ >>> >> >>> oae-dev mailing list >>> >> >>> oae-dev@collab.sakaiproject.org >>> >> >>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >>> >> >> >>> >> >> >>> >> >> >>> >> >> _______________________________________________ >>> >> >> oae-dev mailing list >>> >> >> oae-dev@collab.sakaiproject.org >>> >> >> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >>> >> >> >>> >> > _______________________________________________ >>> >> > oae-dev mailing list >>> >> > oae-dev@collab.sakaiproject.org >>> >> > http://collab.sakaiproject.org/mailman/listinfo/oae-dev >>> >> > Charles Sturt University >>> >> > >>> >> > | ALBURY-WODONGA | BATHURST | CANBERRA | DUBBO | GOULBURN | >>> >> > MELBOURNE | >>> >> > ONTARIO | ORANGE | PORT MACQUARIE | SYDNEY | WAGGA WAGGA | >>> >> > >>> >> > LEGAL NOTICE >>> >> > This email (and any attachment) is confidential and is intended for >>> >> > the >>> >> > use of the addressee(s) only. If you are not the intended recipient >>> >> > of this >>> >> > email, you must not copy, distribute, take any action in reliance on >>> >> > it or >>> >> > disclose it to anyone. Any confidentiality is not waived or lost by >>> >> > reason >>> >> > of mistaken delivery. Email should be checked for viruses and >>> >> > defects before >>> >> > opening. Charles Sturt University (CSU) does not accept liability >>> >> > for >>> >> > viruses or any consequence which arise as a result of this email >>> >> > transmission. Email communications with CSU may be subject to >>> >> > automated >>> >> > email filtering, which could result in the delay or deletion of a >>> >> > legitimate >>> >> > email before it is read at CSU. The views expressed in this email >>> >> > are not >>> >> > necessarily those of CSU. >>> >> > >>> >> > Charles Sturt University in Australia http://www.csu.edu.au The >>> >> > Chancellery, Panorama Avenue, Bathurst NSW Australia 2795 ABN: 83 >>> >> > 878 708 >>> >> > 551; CRICOS Provider Numbers: 00005F (NSW), 01947G (VIC), 02960B >>> >> > (ACT) >>> >> > >>> >> > Charles Sturt University in Ontario http://www.charlessturt.ca 860 >>> >> > Harrington Court, Burlington Ontario Canada L7N 3N4 Registration: >>> >> > www.peqab.ca >>> >> > >>> >> > Consider the environment before printing this email. >>> > >>> > >> >> >> _______________________________________________ >> oae-dev mailing list >> oae-dev@collab.sakaiproject.org >> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >> >> >> >> _______________________________________________ >> oae-dev mailing list >> oae-dev@collab.sakaiproject.org >> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >> >> >> >> _______________________________________________ >> oae-dev mailing list >> oae-dev@collab.sakaiproject.org >> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >> > > _______________________________________________ > oae-dev mailing list > oae-dev@collab.sakaiproject.org > http://collab.sakaiproject.org/mailman/listinfo/oae-dev > > > > _______________________________________________ > oae-dev mailing list > oae-dev@collab.sakaiproject.org > http://collab.sakaiproject.org/mailman/listinfo/oae-dev > _______________________________________________ oae-dev mailing list oae-dev@collab.sakaiproject.org http://collab.sakaiproject.org/mailman/listinfo/oae-dev