Several questions/comments. There has already been 1.4.1. release proposed for immediately following 1.4.0 that would be isolated to code reformatting . Which would take precedence?
We should definitely do a bug bash. One of the dangers of doing a bug bash focused on the preview processor is that we'll likely have people uploading hundreds of files each. Subjectively, this could give the impression of decreased performance just because we're hitting it much harder. More importantly, in addition to the bug bash, we need to do controlled tests on processing time on different data types. I'd like to break it down by file types and have truly controlled tests, in addition to different file types we'll need files of varying sizes to compare performance not just on quantity but on complexity. This needs to be compared to the performance of the current implementation. I think we all agree that this is an important feature that we shouldn't try to rush out the door. I have to read back through the thread, but is there set-up documentation? Currently we have a section on the OAE Configuration and Deployment page [1] for the preview processor. It's contains multiple supporting external links that have proven confusing for many people trying to get preview processor running locally. We'll need to make sure we have adequate documentation. As a side note, I will be out of the office starting this Friday through next week. [1] https://confluence.sakaiproject.org/display/3AK/OAE+Configuration+and+Deployment -- Kent Fitzgerald On Tuesday, July 24, 2012 at 9:51 AM, Nicolaas Matthijs wrote: > Looks like this has been hanging around on list for a while now, and we > should probably try to move it forwards. > > The maintainability criterion can only be determined by a code review, which > is standard practice. However, as this is proving to be such a critical > feature in production, I'd suggest that we do a separate bugbash to evaluate > its performance, ease of setup (running from a separate machine) and most > importantly functional equivalence. > > When doing this, Kent can give his assessment of the ease of setup and the > bugbashers can determine functional equivalence. We should also try to have > it re-process the dummy content we usually bugbash with. > > If this all sounds good, I'd like to go ahead with this as soon as possible > and run a bugbash straight after the 1.4.0 release with all of this set up. > If the implementation survives the bugbash, it can be reviewed and merged. > > Does that sound reasonable? > > Thanks, > Nicolaas > > > > On 23 Jul 2012, at 07:42, Carl Hall wrote: > > Lance, I think the work is already split the way you suggest given what I > > know about what Erik has done (rewrite in Java) and what's left (add JMS). > > Adding message queue capabilities should not hold back reviewing the > > proposed changes. > > > > I would say that it needs to meet these opening criteria for my general > > acceptance: > > > > * Be functionally equal with the current solution > > * A combination of performance and maintainability > > * Perform can be no worse overall. There might be different hotspots in > > the java version than the current ruby solution but there shouldn't be > > anything exponentially worse. Overall, the java version has to perform at > > least as good and hopefully better. Memory usage, overall processing time, > > resource usage (iops, disc reads, caching) should all be considered. > > * Be more maintainable than the Ruby solution. The current code has had > > very little cleaning and is not very readable. This includes using > > externally available libraries where possible. We shouldn't be maintaining > > plumbing not inherent to our domain. > > * Easier to setup. Though our current setup for the ruby PP is known to be > > problematic, we at least are accustomed to it. The proposed solution has > > got to be more straightforward and less fragile. > > > > The numbers I've seen from some preliminary testing showed the Java impl to > > take exponentially *less* time to process pdfs and was faster than the ruby > > PP in every test. It's an OSGi bundle and written in Java like the rest of > > our project which makes it easier to setup and maintain as we write far > > more java code than ruby. I believe there's also already a setup available > > to run the java PP as a standalone server. > > The Java version introduces a topia term extractor bundle which is a port > > from the Python version. This is a point of maintenance to consider but the > > python code has changed in years. It's a common impl for other languages to > > port but there wasn't a java version around. I would like to see this code > > find a permanent home in a relative OSS project. At the very least it > > should be maintained apart from OAE core to make it available to a broader > > audience. > > > > +1 to getting this code wrapped up and reviewed. > > > > On Wed, Jul 18, 2012 at 1:51 PM, Christian Vuerings > > <vueringschrist...@gmail.com (mailto:vueringschrist...@gmail.com)> wrote: > > > I'm not sure whether this is already part of the criteria list or not, > > > but what about CPU/Memory usage? > > > Is there a way we can measure that and compare it to the current ruby > > > based PP? > > > When I currently run the ruby PP locally, it's usually one of the > > > processes that uses the most resources. > > > > > > One other thing I'm curious about is how well it will compress/handle the > > > different file formats (png/jpg/gif/psd) > > > > > > These are just 2 things that I'm interested in since they (can) have an > > > impact on the overall performance. > > > > > > > > > - Christian > > > > > > On Jul 18, 2012, at 12:41 PM, Lance Speelmon wrote: > > > > Does anyone have an opinion about adopting the new java based PP? > > > > Specifically can you articulate acceptance criteria for such an > > > > adoption? e.g. > > > > > > > > Must support same preview behaviors as existing ruby-based PP. > > > > Must pass QA with all blocker and critical items resolved. > > > > Must start automatically OOTB to support the tire-kicking, web-start > > > > uses. > > > > Must leverage as much 3rd party code as possible to minimize ownership > > > > costs. > > > > Must pass code review. > > > > Unit test code coverage. > > > > Basic config and deployment documentation. > > > > > > > > > > > > What is missing? Anything? Thanks, L > > > > > > > > > > > > > > > > On Jul 17, 2012, at 2:58 PM, Lance Speelmon <la...@rsmart.com > > > > (mailto:la...@rsmart.com)> wrote: > > > > > Is there any way to break this work down into chunks? e.g. > > > > > > > > > > 1. Adopt java PP as default PP moving forward. What are the > > > > > acceptance criteria? > > > > > 2. Enhance new java PP with message queue abilities. > > > > > > > > > > WDYT? Thanks, L > > > > > > > > > > On Jul 17, 2012, at 8:34 AM, Carl Hall <c...@hallwaytech.com > > > > > (mailto:c...@hallwaytech.com)> wrote: > > > > > > Each app server could run it's own queues but that wouldn't support > > > > > > building a farm of PP processors unless we also teach them to talk > > > > > > to multiple JMS servers. Maybe something like DNS round-robin would > > > > > > suffice? > > > > > > > > > > > > On Tue, Jul 17, 2012 at 8:25 AM, Erik Froese <erik.fro...@gmail.com > > > > > > (mailto:erik.fro...@gmail.com)> wrote: > > > > > > > Do we need to cluster activemq? Can't each app server service its > > > > > > > own queues? > > > > > > > Erik > > > > > > > > > > > > > > On Tue, Jul 17, 2012 at 11:23 AM, Carl Hall <c...@hallwaytech.com > > > > > > > (mailto:c...@hallwaytech.com)> wrote: > > > > > > > > What Erik describes has been on the dev wish list for a little > > > > > > > > while now. > > > > > > > > Moving to an event-driven model would allow us to build out > > > > > > > > concurrency but > > > > > > > > there also comes the question of clustering ActiveMQ. > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jul 12, 2012 at 6:27 AM, Erik Froese > > > > > > > > <erik.fro...@gmail.com (mailto:erik.fro...@gmail.com)> wrote: > > > > > > > >> > > > > > > > >> Hey David, > > > > > > > >> > > > > > > > >> The code is not clustered. > > > > > > > >> > > > > > > > >> You'd need to write an event listener that would fire when new > > > > > > > >> content > > > > > > > >> is uploaded. It would put the content ids on a JMS queue. Then > > > > > > > >> implement a ContentFetcher that grabs a message off of the > > > > > > > >> queue and > > > > > > > >> wire that into the PPI. Events and Messages are not clustered > > > > > > > >> in OAE > > > > > > > >> (AFAIK) so this would have to be run on each app server. > > > > > > > >> > > > > > > > >> While we're in event-land it'd be nice to be able to > > > > > > > >> regenerate a > > > > > > > >> preview when a content body is updated. I'm not sure if this is > > > > > > > >> possible yet. > > > > > > > >> > > > > > > > >> I'm not sure how we'd limit the CPU usage yet either. You > > > > > > > >> could manage > > > > > > > >> the quartz schedule that runs the PPI. > > > > > > > >> > > > > > > > >> We can also disable concurrent executions of the job. > > > > > > > >> > > > > > > > >> Erik > > > > > > > >> > > > > > > > >> On Wed, Jul 11, 2012 at 8:44 PM, Roma, David <dr...@csu.edu.au > > > > > > > >> (mailto:dr...@csu.edu.au)> wrote: > > > > > > > >> > Awesome news Erik! > > > > > > > >> > > > > > > > > >> > Our Ops guys will be stoked when we can get this in.. A > > > > > > > >> > couple of > > > > > > > >> > questions from someone who hasn't looked at the code or read > > > > > > > >> > too deeply.... > > > > > > > >> > - Does it support clustering > > > > > > > >> > -e.g. can we just run it side by side on each of our > > > > > > > >> > app servers > > > > > > > >> > and they will play nice sharing out processing jobs? > > > > > > > >> > -will it affect performance of the app servers much? > > > > > > > >> > Can we > > > > > > > >> > limit the preview processor to say 10%cpu and 500mb ram or > > > > > > > >> > low priority > > > > > > > >> > threads or limit the number of items to process or > > > > > > > >> > something? This would > > > > > > > >> > make for a nice simple deployment that wouldn't threaten the > > > > > > > >> > app server > > > > > > > >> > stability. > > > > > > > >> > > > > > > > > >> > Cheers, > > > > > > > >> > Dave. > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > -----Original Message----- > > > > > > > >> > From: oae-dev-boun...@collab.sakaiproject.org > > > > > > > >> > (mailto:oae-dev-boun...@collab.sakaiproject.org) > > > > > > > >> > [mailto:oae-dev-boun...@collab.sakaiproject.org] On Behalf > > > > > > > >> > Of Erik Froese > > > > > > > >> > Sent: Thursday, 12 July 2012 2:37 AM > > > > > > > >> > To: Carl Hall > > > > > > > >> > Cc: oae-dev@collab.sakaiproject.org > > > > > > > >> > (mailto:oae-dev@collab.sakaiproject.org); Clay Fenlason > > > > > > > >> > Subject: Re: [oae-dev] Moving the preview processor to java > > > > > > > >> > > > > > > > > >> > Hey everyone, > > > > > > > >> > > > > > > > > >> > Its been a few months but I actually implemented the Java > > > > > > > >> > preview > > > > > > > >> > processor as an OSGi bundle. I filed a ticket for it [1] > > > > > > > >> > > > > > > > > >> > I'm not sure where to go from here. Is this something that > > > > > > > >> > could be > > > > > > > >> > included POST 1.4.0? > > > > > > > >> > Should I open a PR so we can review the code? If so, PR > > > > > > > >> > against which > > > > > > > >> > branch? > > > > > > > >> > > > > > > > > >> > Either way, have a look, give it a go. We'll probably wind > > > > > > > >> > up using it > > > > > > > >> > at rSmart. > > > > > > > >> > > > > > > > > >> > Erik > > > > > > > >> > > > > > > > > >> > [1] https://jira.sakaiproject.org/browse/KERN-3021 > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > On Tue, Apr 17, 2012 at 9:09 AM, Carl Hall > > > > > > > >> > <c...@hallwaytech.com (mailto:c...@hallwaytech.com)> wrote: > > > > > > > >> >> I totally agree that we should ally ourselves with other > > > > > > > >> >> communities. I > > > > > > > >> >> see > > > > > > > >> >> where we get docsplit from DocumentCloud[1] and we use > > > > > > > >> >> several other > > > > > > > >> >> libraries for processing that they've most likely > > > > > > > >> >> contributed to. > > > > > > > >> >> The Java approach is very little custom code compared to > > > > > > > >> >> the libraries > > > > > > > >> >> we're > > > > > > > >> >> getting from Apache (tika, sanselan, commons, pdfbox), so > > > > > > > >> >> we would > > > > > > > >> >> still > > > > > > > >> >> building on the shoulders of our friendly community giants. > > > > > > > >> >> > > > > > > > >> >> 1 https://github.com/documentcloud/docsplit > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> On Sat, Apr 14, 2012 at 5:43 AM, John Norman > > > > > > > >> >> <j...@caret.cam.ac.uk (mailto:j...@caret.cam.ac.uk)> > > > > > > > >> >> wrote: > > > > > > > >> >>> > > > > > > > >> >>> My recollection (perhaps wrong) is that we got this from > > > > > > > >> >>> Document > > > > > > > >> >>> Cloud > > > > > > > >> >>> and I /think/ Chris Roby found it. Document Cloud seems a > > > > > > > >> >>> very > > > > > > > >> >>> relevant and > > > > > > > >> >>> valuable project. If we were able to help them while > > > > > > > >> >>> helping > > > > > > > >> >>> ourselves, > > > > > > > >> >>> other good things could come from the relationship. My > > > > > > > >> >>> general point > > > > > > > >> >>> is that > > > > > > > >> >>> we are thin on resources and so, in principle, symbiotic > > > > > > > >> >>> relationships > > > > > > > >> >>> are > > > > > > > >> >>> helpful. > > > > > > > >> >>> > > > > > > > >> >>> http://www.documentcloud.org/home > > > > > > > >> >>> > > > > > > > >> >>> John > > > > > > > >> >>> > > > > > > > >> >>> Sent from my iPad > > > > > > > >> >>> > > > > > > > >> >>> On 13 Apr 2012, at 17:03, Carl Hall <c...@hallwaytech.com > > > > > > > >> >>> (mailto:c...@hallwaytech.com)> wrote: > > > > > > > >> >>> > > > > > > > >> >>> I agree with Daniel that our modifications to the preview > > > > > > > >> >>> processor > > > > > > > >> >>> have > > > > > > > >> >>> put its ownership square on us. Was there a community that > > > > > > > >> >>> this script > > > > > > > >> >>> was > > > > > > > >> >>> borrowed from? I thought it was original development that > > > > > > > >> >>> uses various > > > > > > > >> >>> external libraries to do the actual work. This is the > > > > > > > >> >>> approach that > > > > > > > >> >>> Erik is > > > > > > > >> >>> taking with the rewrite using things like Tika (text > > > > > > > >> >>> extraction), > > > > > > > >> >>> Sanselan > > > > > > > >> >>> (images) and a Java port of the python topia.termextract > > > > > > > >> >>> library. > > > > > > > >> >>> > > > > > > > >> >>> I certainly don't deny the speed of development that was > > > > > > > >> >>> realized in > > > > > > > >> >>> creating the PP but the current state of the code is a > > > > > > > >> >>> mess at best. > > > > > > > >> >>> Reuse > > > > > > > >> >>> of libraries in Java is showing a fast rewrite with very > > > > > > > >> >>> little > > > > > > > >> >>> managed code > > > > > > > >> >>> on our part. > > > > > > > >> >>> > > > > > > > >> >>> > > > > > > > >> >>> On Fri, Apr 13, 2012 at 12:50 AM, Daniel Parry > > > > > > > >> >>> <dan...@caret.cam.ac.uk (mailto:dan...@caret.cam.ac.uk)> > > > > > > > >> >>> wrote: > > > > > > > >> >>>> > > > > > > > >> >>>> On Thu, Apr 12, 2012 at 04:21:36PM -0400, Clay Fenlason > > > > > > > >> >>>> wrote: > > > > > > > >> >>>> > I think this response is at best orthogonal to the > > > > > > > >> >>>> > point John's > > > > > > > >> >>>> > trying > > > > > > > >> >>>> > to raise, though I gather this kind of reaction must > > > > > > > >> >>>> > come from a > > > > > > > >> >>>> > buildup of some real frustration around the PP, which I > > > > > > > >> >>>> > don't mean > > > > > > > >> >>>> > to > > > > > > > >> >>>> > discount. I also think John was pretty clear about what > > > > > > > >> >>>> > he was > > > > > > > >> >>>> > suggesting: that there be a conversation with the > > > > > > > >> >>>> > community we got > > > > > > > >> >>>> > the > > > > > > > >> >>>> > PP from, if the conversation hasn't happened already, > > > > > > > >> >>>> > to see if > > > > > > > >> >>>> > there > > > > > > > >> >>>> > might still be a way to work together before we decide > > > > > > > >> >>>> > to just own > > > > > > > >> >>>> > it > > > > > > > >> >>>> > ourselves. > > > > > > > >> >>>> > > > > > > > >> >>>> I'd suggest the way that the preview processor was being > > > > > > > >> >>>> extended > > > > > > > >> >>>> (initially a > > > > > > > >> >>>> python server add on, followed by a ruby rewrite for tag > > > > > > > >> >>>> extraction) > > > > > > > >> >>>> and > > > > > > > >> >>>> the > > > > > > > >> >>>> variety of ruby versions that deployers were using and > > > > > > > >> >>>> the methods > > > > > > > >> >>>> used > > > > > > > >> >>>> to > > > > > > > >> >>>> deploy it were indicative of a) the OAE community already > > > > > > > >> >>>> 'owning' > > > > > > > >> >>>> the PP > > > > > > > >> >>>> and b) > > > > > > > >> >>>> as has already been pointed out some standardization > > > > > > > >> >>>> needed restoring > > > > > > > >> >>>> and > > > > > > > >> >>>> additional functionality added for deployers. Hence, the > > > > > > > >> >>>> list was > > > > > > > >> >>>> pinged[0] a > > > > > > > >> >>>> while back to ask about standardizing and extending in > > > > > > > >> >>>> java. I'm not > > > > > > > >> >>>> sure > > > > > > > >> >>>> of any > > > > > > > >> >>>> other way to contact the original PP community or if such > > > > > > > >> >>>> a community > > > > > > > >> >>>> separate > > > > > > > >> >>>> to OAE even still exists? > > > > > > > >> >>>> > > > > > > > >> >>>> Best wishes, > > > > > > > >> >>>> > > > > > > > >> >>>> Daniel > > > > > > > >> >>>> > > > > > > > >> >>>> [0] > > > > > > > >> >>>> > > > > > > > >> >>>> http://collab.sakaiproject.org/pipermail/oae-dev/2012-April/001677.html > > > > > > > >> >>>> > > > > > > > >> >>>> -- > > > > > > > >> >>>> --| Daniel Parry: dan...@caret.cam.ac.uk > > > > > > > >> >>>> (mailto:dan...@caret.cam.ac.uk). www.caret.cam.ac.uk/ > > > > > > > >> >>>> (http://www.caret.cam.ac.uk/) |-- > > > > > > > >> >>>> "Of all the things a leader should fear, complacency > > > > > > > >> >>>> should > > > > > > > >> >>>> head the list." [John C. Maxwell] > > > > > > > >> >>>> _______________________________________________ > > > > > > > >> >>>> oae-dev mailing list > > > > > > > >> >>>> oae-dev@collab.sakaiproject.org > > > > > > > >> >>>> (mailto:oae-dev@collab.sakaiproject.org) > > > > > > > >> >>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev > > > > > > > >> >>> > > > > > > > >> >>> > > > > > > > >> >>> _______________________________________________ > > > > > > > >> >>> oae-dev mailing list > > > > > > > >> >>> oae-dev@collab.sakaiproject.org > > > > > > > >> >>> (mailto:oae-dev@collab.sakaiproject.org) > > > > > > > >> >>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> _______________________________________________ > > > > > > > >> >> oae-dev mailing list > > > > > > > >> >> oae-dev@collab.sakaiproject.org > > > > > > > >> >> (mailto:oae-dev@collab.sakaiproject.org) > > > > > > > >> >> http://collab.sakaiproject.org/mailman/listinfo/oae-dev > > > > > > > >> >> > > > > > > > >> > _______________________________________________ > > > > > > > >> > oae-dev mailing list > > > > > > > >> > oae-dev@collab.sakaiproject.org > > > > > > > >> > (mailto:oae-dev@collab.sakaiproject.org) > > > > > > > >> > http://collab.sakaiproject.org/mailman/listinfo/oae-dev > > > > > > > >> > Charles Sturt University > > > > > > > >> > > > > > > > > >> > | ALBURY-WODONGA | BATHURST | CANBERRA | DUBBO | GOULBURN | > > > > > > > >> > MELBOURNE | > > > > > > > >> > ONTARIO | ORANGE | PORT MACQUARIE | SYDNEY | WAGGA WAGGA | > > > > > > > >> > > > > > > > > >> > LEGAL NOTICE > > > > > > > >> > This email (and any attachment) is confidential and is > > > > > > > >> > intended for the > > > > > > > >> > use of the addressee(s) only. If you are not the intended > > > > > > > >> > recipient of this > > > > > > > >> > email, you must not copy, distribute, take any action in > > > > > > > >> > reliance on it or > > > > > > > >> > disclose it to anyone. Any confidentiality is not waived or > > > > > > > >> > lost by reason > > > > > > > >> > of mistaken delivery. Email should be checked for viruses > > > > > > > >> > and defects before > > > > > > > >> > opening. Charles Sturt University (CSU) does not accept > > > > > > > >> > liability for > > > > > > > >> > viruses or any consequence which arise as a result of this > > > > > > > >> > email > > > > > > > >> > transmission. Email communications with CSU may be subject > > > > > > > >> > to automated > > > > > > > >> > email filtering, which could result in the delay or deletion > > > > > > > >> > of a legitimate > > > > > > > >> > email before it is read at CSU. The views expressed in this > > > > > > > >> > email are not > > > > > > > >> > necessarily those of CSU. > > > > > > > >> > > > > > > > > >> > Charles Sturt University in Australia http://www.csu.edu.au > > > > > > > >> > (http://www.csu.edu.au/) The > > > > > > > >> > Chancellery, Panorama Avenue, Bathurst NSW Australia 2795 > > > > > > > >> > ABN: 83 878 708 > > > > > > > >> > 551; CRICOS Provider Numbers: 00005F (NSW), 01947G (VIC), > > > > > > > >> > 02960B (ACT) > > > > > > > >> > > > > > > > > >> > Charles Sturt University in Ontario > > > > > > > >> > http://www.charlessturt.ca (http://www.charlessturt.ca/) 860 > > > > > > > >> > Harrington Court, Burlington Ontario Canada L7N 3N4 > > > > > > > >> > Registration: > > > > > > > >> > www.peqab.ca (http://www.peqab.ca/) > > > > > > > >> > > > > > > > > >> > Consider the environment before printing this email. > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > oae-dev mailing list > > > > > > oae-dev@collab.sakaiproject.org > > > > > > (mailto:oae-dev@collab.sakaiproject.org) > > > > > > http://collab.sakaiproject.org/mailman/listinfo/oae-dev > > > > > > > > > > > > > _______________________________________________ > > > > oae-dev mailing list > > > > oae-dev@collab.sakaiproject.org (mailto:oae-dev@collab.sakaiproject.org) > > > > http://collab.sakaiproject.org/mailman/listinfo/oae-dev > > > > > > > > > _______________________________________________ > > > oae-dev mailing list > > > oae-dev@collab.sakaiproject.org (mailto:oae-dev@collab.sakaiproject.org) > > > http://collab.sakaiproject.org/mailman/listinfo/oae-dev > > > > > > > _______________________________________________ > > oae-dev mailing list > > oae-dev@collab.sakaiproject.org (mailto:oae-dev@collab.sakaiproject.org) > > http://collab.sakaiproject.org/mailman/listinfo/oae-dev > > _______________________________________________ > oae-dev mailing list > oae-dev@collab.sakaiproject.org (mailto:oae-dev@collab.sakaiproject.org) > http://collab.sakaiproject.org/mailman/listinfo/oae-dev > >
_______________________________________________ oae-dev mailing list oae-dev@collab.sakaiproject.org http://collab.sakaiproject.org/mailman/listinfo/oae-dev