Lance, I think the work is already split the way you suggest given what I know about what Erik has done (rewrite in Java) and what's left (add JMS). Adding message queue capabilities should not hold back reviewing the proposed changes.
I would say that it needs to meet these opening criteria for my general acceptance: * Be functionally equal with the current solution * A combination of performance and maintainability * Perform can be no worse overall. There might be different hotspots in the java version than the current ruby solution but there shouldn't be anything exponentially worse. Overall, the java version has to perform at least as good and hopefully better. Memory usage, overall processing time, resource usage (iops, disc reads, caching) should all be considered. * Be more maintainable than the Ruby solution. The current code has had very little cleaning and is not very readable. This includes using externally available libraries where possible. We shouldn't be maintaining plumbing not inherent to our domain. * Easier to setup. Though our current setup for the ruby PP is known to be problematic, we at least are accustomed to it. The proposed solution has got to be more straightforward and less fragile. The numbers I've seen from some preliminary testing showed the Java impl to take exponentially *less* time to process pdfs and was faster than the ruby PP in every test. It's an OSGi bundle and written in Java like the rest of our project which makes it easier to setup and maintain as we write far more java code than ruby. I believe there's also already a setup available to run the java PP as a standalone server. The Java version introduces a topia term extractor bundle which is a port from the Python version. This is a point of maintenance to consider but the python code has changed in years. It's a common impl for other languages to port but there wasn't a java version around. I would like to see this code find a permanent home in a relative OSS project. At the very least it should be maintained apart from OAE core to make it available to a broader audience. +1 to getting this code wrapped up and reviewed. On Wed, Jul 18, 2012 at 1:51 PM, Christian Vuerings < vueringschrist...@gmail.com> wrote: > I'm not sure whether this is already part of the criteria list or not, but > what about CPU/Memory usage? > Is there a way we can measure that and compare it to the current ruby > based PP? > When I currently run the ruby PP locally, it's usually one of the > processes that uses the most resources. > > One other thing I'm curious about is how well it will compress/handle the > different file formats (png/jpg/gif/psd) > > These are just 2 things that I'm interested in since they (can) have an > impact on the overall performance. > > > - Christian > > On Jul 18, 2012, at 12:41 PM, Lance Speelmon wrote: > > Does anyone have an opinion about adopting the new java based PP? > Specifically can you articulate acceptance criteria for such an adoption? > e.g. > > > 1. Must support same preview behaviors as existing ruby-based PP. > 2. Must pass QA with all blocker and critical items resolved. > 3. Must start automatically OOTB to support the tire-kicking, > web-start uses. > 4. Must leverage as much 3rd party code as possible to minimize > ownership costs. > 5. Must pass code review. > 6. Unit test code coverage. > 7. Basic config and deployment documentation. > > > What is missing? Anything? Thanks, L > > > > On Jul 17, 2012, at 2:58 PM, Lance Speelmon <la...@rsmart.com> wrote: > > Is there any way to break this work down into chunks? e.g. > > 1. Adopt java PP as default PP moving forward. What are the acceptance > criteria? > 2. Enhance new java PP with message queue abilities. > > WDYT? Thanks, L > > On Jul 17, 2012, at 8:34 AM, Carl Hall <c...@hallwaytech.com> wrote: > > Each app server could run it's own queues but that wouldn't support > building a farm of PP processors unless we also teach them to talk to > multiple JMS servers. Maybe something like DNS round-robin would suffice? > > On Tue, Jul 17, 2012 at 8:25 AM, Erik Froese <erik.fro...@gmail.com>wrote: > >> Do we need to cluster activemq? Can't each app server service its own >> queues? >> Erik >> >> On Tue, Jul 17, 2012 at 11:23 AM, Carl Hall <c...@hallwaytech.com> wrote: >> > What Erik describes has been on the dev wish list for a little while >> now. >> > Moving to an event-driven model would allow us to build out concurrency >> but >> > there also comes the question of clustering ActiveMQ. >> > >> > >> > On Thu, Jul 12, 2012 at 6:27 AM, Erik Froese <erik.fro...@gmail.com> >> wrote: >> >> >> >> Hey David, >> >> >> >> The code is not clustered. >> >> >> >> You'd need to write an event listener that would fire when new content >> >> is uploaded. It would put the content ids on a JMS queue. Then >> >> implement a ContentFetcher that grabs a message off of the queue and >> >> wire that into the PPI. Events and Messages are not clustered in OAE >> >> (AFAIK) so this would have to be run on each app server. >> >> >> >> While we're in event-land it'd be nice to be able to regenerate a >> >> preview when a content body is updated. I'm not sure if this is >> >> possible yet. >> >> >> >> I'm not sure how we'd limit the CPU usage yet either. You could manage >> >> the quartz schedule that runs the PPI. >> >> >> >> We can also disable concurrent executions of the job. >> >> >> >> Erik >> >> >> >> On Wed, Jul 11, 2012 at 8:44 PM, Roma, David <dr...@csu.edu.au> wrote: >> >> > Awesome news Erik! >> >> > >> >> > Our Ops guys will be stoked when we can get this in.. A couple of >> >> > questions from someone who hasn't looked at the code or read too >> deeply.... >> >> > - Does it support clustering >> >> > -e.g. can we just run it side by side on each of our app >> servers >> >> > and they will play nice sharing out processing jobs? >> >> > -will it affect performance of the app servers much? Can we >> >> > limit the preview processor to say 10%cpu and 500mb ram or low >> priority >> >> > threads or limit the number of items to process or something? This >> would >> >> > make for a nice simple deployment that wouldn't threaten the app >> server >> >> > stability. >> >> > >> >> > Cheers, >> >> > Dave. >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > -----Original Message----- >> >> > From: oae-dev-boun...@collab.sakaiproject.org >> >> > [mailto:oae-dev-boun...@collab.sakaiproject.org] On Behalf Of Erik >> Froese >> >> > Sent: Thursday, 12 July 2012 2:37 AM >> >> > To: Carl Hall >> >> > Cc: oae-dev@collab.sakaiproject.org; Clay Fenlason >> >> > Subject: Re: [oae-dev] Moving the preview processor to java >> >> > >> >> > Hey everyone, >> >> > >> >> > Its been a few months but I actually implemented the Java preview >> >> > processor as an OSGi bundle. I filed a ticket for it [1] >> >> > >> >> > I'm not sure where to go from here. Is this something that could be >> >> > included POST 1.4.0? >> >> > Should I open a PR so we can review the code? If so, PR against which >> >> > branch? >> >> > >> >> > Either way, have a look, give it a go. We'll probably wind up using >> it >> >> > at rSmart. >> >> > >> >> > Erik >> >> > >> >> > [1] https://jira.sakaiproject.org/browse/KERN-3021 >> >> > >> >> > >> >> > >> >> > On Tue, Apr 17, 2012 at 9:09 AM, Carl Hall <c...@hallwaytech.com> >> wrote: >> >> >> I totally agree that we should ally ourselves with other >> communities. I >> >> >> see >> >> >> where we get docsplit from DocumentCloud[1] and we use several other >> >> >> libraries for processing that they've most likely contributed to. >> >> >> The Java approach is very little custom code compared to the >> libraries >> >> >> we're >> >> >> getting from Apache (tika, sanselan, commons, pdfbox), so we would >> >> >> still >> >> >> building on the shoulders of our friendly community giants. >> >> >> >> >> >> 1 https://github.com/documentcloud/docsplit >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Apr 14, 2012 at 5:43 AM, John Norman <j...@caret.cam.ac.uk> >> >> >> wrote: >> >> >>> >> >> >>> My recollection (perhaps wrong) is that we got this from Document >> >> >>> Cloud >> >> >>> and I /think/ Chris Roby found it. Document Cloud seems a very >> >> >>> relevant and >> >> >>> valuable project. If we were able to help them while helping >> >> >>> ourselves, >> >> >>> other good things could come from the relationship. My general >> point >> >> >>> is that >> >> >>> we are thin on resources and so, in principle, symbiotic >> relationships >> >> >>> are >> >> >>> helpful. >> >> >>> >> >> >>> http://www.documentcloud.org/home >> >> >>> >> >> >>> John >> >> >>> >> >> >>> Sent from my iPad >> >> >>> >> >> >>> On 13 Apr 2012, at 17:03, Carl Hall <c...@hallwaytech.com> wrote: >> >> >>> >> >> >>> I agree with Daniel that our modifications to the preview processor >> >> >>> have >> >> >>> put its ownership square on us. Was there a community that this >> script >> >> >>> was >> >> >>> borrowed from? I thought it was original development that uses >> various >> >> >>> external libraries to do the actual work. This is the approach that >> >> >>> Erik is >> >> >>> taking with the rewrite using things like Tika (text extraction), >> >> >>> Sanselan >> >> >>> (images) and a Java port of the python topia.termextract library. >> >> >>> >> >> >>> I certainly don't deny the speed of development that was realized >> in >> >> >>> creating the PP but the current state of the code is a mess at >> best. >> >> >>> Reuse >> >> >>> of libraries in Java is showing a fast rewrite with very little >> >> >>> managed code >> >> >>> on our part. >> >> >>> >> >> >>> >> >> >>> On Fri, Apr 13, 2012 at 12:50 AM, Daniel Parry >> >> >>> <dan...@caret.cam.ac.uk> >> >> >>> wrote: >> >> >>>> >> >> >>>> On Thu, Apr 12, 2012 at 04:21:36PM -0400, Clay Fenlason wrote: >> >> >>>> > I think this response is at best orthogonal to the point John's >> >> >>>> > trying >> >> >>>> > to raise, though I gather this kind of reaction must come from a >> >> >>>> > buildup of some real frustration around the PP, which I don't >> mean >> >> >>>> > to >> >> >>>> > discount. I also think John was pretty clear about what he was >> >> >>>> > suggesting: that there be a conversation with the community we >> got >> >> >>>> > the >> >> >>>> > PP from, if the conversation hasn't happened already, to see if >> >> >>>> > there >> >> >>>> > might still be a way to work together before we decide to just >> own >> >> >>>> > it >> >> >>>> > ourselves. >> >> >>>> >> >> >>>> I'd suggest the way that the preview processor was being extended >> >> >>>> (initially a >> >> >>>> python server add on, followed by a ruby rewrite for tag >> extraction) >> >> >>>> and >> >> >>>> the >> >> >>>> variety of ruby versions that deployers were using and the methods >> >> >>>> used >> >> >>>> to >> >> >>>> deploy it were indicative of a) the OAE community already 'owning' >> >> >>>> the PP >> >> >>>> and b) >> >> >>>> as has already been pointed out some standardization needed >> restoring >> >> >>>> and >> >> >>>> additional functionality added for deployers. Hence, the list was >> >> >>>> pinged[0] a >> >> >>>> while back to ask about standardizing and extending in java. I'm >> not >> >> >>>> sure >> >> >>>> of any >> >> >>>> other way to contact the original PP community or if such a >> community >> >> >>>> separate >> >> >>>> to OAE even still exists? >> >> >>>> >> >> >>>> Best wishes, >> >> >>>> >> >> >>>> Daniel >> >> >>>> >> >> >>>> [0] >> >> >>>> >> >> >>>> >> http://collab.sakaiproject.org/pipermail/oae-dev/2012-April/001677.html >> >> >>>> >> >> >>>> -- >> >> >>>> --| Daniel Parry: dan...@caret.cam.ac.uk. www.caret.cam.ac.uk/|-- >> >> >>>> "Of all the things a leader should fear, complacency should >> >> >>>> head the list." [John C. Maxwell] >> >> >>>> _______________________________________________ >> >> >>>> oae-dev mailing list >> >> >>>> oae-dev@collab.sakaiproject.org >> >> >>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >> >> >>> >> >> >>> >> >> >>> _______________________________________________ >> >> >>> oae-dev mailing list >> >> >>> oae-dev@collab.sakaiproject.org >> >> >>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> >> >> oae-dev mailing list >> >> >> oae-dev@collab.sakaiproject.org >> >> >> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >> >> >> >> >> > _______________________________________________ >> >> > oae-dev mailing list >> >> > oae-dev@collab.sakaiproject.org >> >> > http://collab.sakaiproject.org/mailman/listinfo/oae-dev >> >> > Charles Sturt University >> >> > >> >> > | ALBURY-WODONGA | BATHURST | CANBERRA | DUBBO | GOULBURN | >> MELBOURNE | >> >> > ONTARIO | ORANGE | PORT MACQUARIE | SYDNEY | WAGGA WAGGA | >> >> > >> >> > LEGAL NOTICE >> >> > This email (and any attachment) is confidential and is intended for >> the >> >> > use of the addressee(s) only. If you are not the intended recipient >> of this >> >> > email, you must not copy, distribute, take any action in reliance on >> it or >> >> > disclose it to anyone. Any confidentiality is not waived or lost by >> reason >> >> > of mistaken delivery. Email should be checked for viruses and >> defects before >> >> > opening. Charles Sturt University (CSU) does not accept liability for >> >> > viruses or any consequence which arise as a result of this email >> >> > transmission. Email communications with CSU may be subject to >> automated >> >> > email filtering, which could result in the delay or deletion of a >> legitimate >> >> > email before it is read at CSU. The views expressed in this email >> are not >> >> > necessarily those of CSU. >> >> > >> >> > Charles Sturt University in Australia http://www.csu.edu.au The >> >> > Chancellery, Panorama Avenue, Bathurst NSW Australia 2795 ABN: 83 >> 878 708 >> >> > 551; CRICOS Provider Numbers: 00005F (NSW), 01947G (VIC), 02960B >> (ACT) >> >> > >> >> > Charles Sturt University in Ontario http://www.charlessturt.ca 860 >> >> > Harrington Court, Burlington Ontario Canada L7N 3N4 Registration: >> >> > www.peqab.ca >> >> > >> >> > Consider the environment before printing this email. >> > >> > >> > > _______________________________________________ > oae-dev mailing list > oae-dev@collab.sakaiproject.org > http://collab.sakaiproject.org/mailman/listinfo/oae-dev > > > > _______________________________________________ > oae-dev mailing list > oae-dev@collab.sakaiproject.org > http://collab.sakaiproject.org/mailman/listinfo/oae-dev > > > > _______________________________________________ > oae-dev mailing list > oae-dev@collab.sakaiproject.org > http://collab.sakaiproject.org/mailman/listinfo/oae-dev > >
_______________________________________________ oae-dev mailing list oae-dev@collab.sakaiproject.org http://collab.sakaiproject.org/mailman/listinfo/oae-dev