Re: Organizing the Lucene meetup (Was: ApacheCon US)
There is an initial schedule online at: http://wiki.apache.org/lucene-java/LuceneAtApacheConUs2009 Isabel I still plan to do the Katta introduction. Is someone officially maintain the page or should I just go ahead and remove the question mark myself? Stefan
Re: [APACHECON] Planning
Hi Grant, sorry I lost track here, is there a list of excepted presentations somewhere? Stefan ~~~ Hadoop training and consulting http://www.scaleunlimited.com http://www.101tec.com On Jun 17, 2009, at 8:42 AM, Grant Ingersoll wrote: Note, you may not have permission to view that page. Sorry. Not my call. Also note that it is _MY_ understanding that airfare is no longer covered as part of the speaker package. Maybe others can confirm this. I'm not sure how this effects people's willingness to speak, but it is a downer in my mind. However, the ASF does have a Travel Assistance Committee that people can apply to for assistance. I don't know the details of that. On Jun 17, 2009, at 10:42 AM, Grant Ingersoll wrote: OK, we've been alloted 2 days for Lucene: http://wiki.apache.org/concom-planning/ParticipatingPmcs . More later on info about the Calls for Presentations (CFPs) Now we need to figure out what we are going to do. Also, we need, asap, a description that satisfies: snip In order to get registration open ASAP, we need a promo-text for each track. If you're copied on this email, I'll be nagging you for a text for your project (listed below). If there's someone else I should nag instead, please let me know. If you know who I should be nagging for the last three tracks below, please let me know that too. What should the promo text look like? We need 150-200 words, explaining -what the track will cover (outline is fine, you don't need to have abstracts and bios if you're not ready for that), -who the intended audience is, and -why people will want to attend/what they'll get out of it. If you're planning something amazing, cool, new or exciting, we want some information about that. Is there going to be a panel discussion with some of the central project members telling people what to expect in the widely-anticipated next release? How about a hands-on masterclass with that really tricky part of the project that everyone has trouble with? Or everything you need to know to decide which technologies to use in which situations, and how to get the most out of your limited resources? /snip
ScaleCamp: get together the night before Hadoop Summit
Hi All, We are planing a community event the night before the Hadoop Summit. This BarCamp (http://en.wikipedia.org/wiki/BarCamp) event will be held at the same venue as the Summit (Santa Clara Marriott). Refreshments will be served to encourage socializing. To initialize conversations for the social part of the evening we are offering people the opportunity to present an experience report of their project (within a 15 min presentation). We have 12 slots in 3 parallel tracks max. The focus should be on projects leveraging technologies from the Hadoop eco-system. Please join us and mingle with the rest of the Hadoop community. To find out more about this event and signup please visit : http://www.scaleunlimited.com/events/scale_camp Please submit your presentation here: http://www.scaleunlimited.com/about-us/contact Stefan P.S. Please spread the word! P.P.S Apologies for the cross posting.
Re: Lucene Performance and usage alternatives
An alternative is always to distribute the index to a set of servers. If you need to scale I guess this is the only long term perspective. You can do your own home grown lucene distribution or look into existing one. I'm currently working on katta (http://katta.wiki.sourceforge.net/) - there is no release yet but we are in the QA and test cycles. But there are other as well - solar for example provides distribution as well. Stefan On Aug 5, 2008, at 7:21 AM, ezer wrote: I just made a program using the java api of Lucene. Its is working fine for my actually index size. But i am worried about performance with an biger index and simultaneous users access. 1) I am worried with the fact of having to make the program in java. I searched for alternative like the C Port, but i saw that the version used its a little old an no much people seem to use that. 2) I also thinking in compiling the code with cgj to generate native code and not use the jvm. Anybody tried it ? Can be an advantage that could aproximate to the performance of a C program ? 3) I wont use an application server, i will call the program directly from a php page, is there any architecture model suggested for doing that? I mean for preview many users accessing to the program. The fact of initiating one isntance each time someone do a query and opening the index should not degrade the performance? -- View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html Sent from the Lucene - General mailing list archive at Nabble.com. ~~~ 101tec Inc. Menlo Park, California, USA http://www.101tec.com
Re: Lucene-based Distributed Index Leveraging Hadoop
Should we start from scratch or with a code contribution? Someone still want to contribute its implementation? I just noticed - to late though - Ning already contributed the code to hadoop. So I guess my question should be rephrased what is the idea of moving this into a own project?
Re: [PROPOSAL] index server project
Hi, do people think we are already in a stage where we can setup some basic infrastructure like mailing list and wiki and move the discussion to the new mailing list. Maybe setup a incubator project? I would be happy to help with such basic tasks. Stefan Am 31.10.2006 um 22:03 schrieb Yonik Seeley: On 10/30/06, Doug Cutting [EMAIL PROTECTED] wrote: Yonik Seeley wrote: On 10/18/06, Doug Cutting [EMAIL PROTECTED] wrote: We assume that, within an index, a file with a given name is written only once. Is this necessary, and will we need the lockless patch (that avoids renaming or rewriting *any* files), or is Lucene's current index behavior sufficient? It's not strictly required, but it would make index synchronization a lot simpler. Yes, I was assuming the lockless patch would be committed to Lucene before this project gets very far. Something more than that would be required in order to keep old versions, but this could be as simple as a Directory subclass that refuses to remove files for a time. Or a snapshot (hard links) mechanism. Lucene would also need a way to open a specific index version (rather than just the latest), but I guess that could also be hacked into Directory by hiding later segments files (assumes lockless is committed). It's unfortunate the master needs to be involved on every document add. That should not normally be the case. Ahh... I had assumed that id in the following method was document id: IndexLocation getUpdateableIndex(String id); I see now it's index id. But what is index id exactly? Looking at the example API you laid down, it must be a single physical index (as opposed to a logical index). In which case, is it entirely up to the client to manage multi-shard indicies? For example, if we had a photo index broken up into 3 shards, each shard would have a separate index id and it would be up to the client to know this, and to query across the different photo0, photo1, photo2 indicies. The master would have no clue those indicies were related. Hmmm, that doesn't work very well for deletes though. It seems like there should be the concept of a logical index, that is composed of multiple shards, and each shard has multiple copies. Or were you thinking that a cluster would only contain a single logical index, and hence all different index ids are simply different shards of that single logical index? That would seem to be consistent with ClientToMasterProtocol .getSearchableIndexes() lacking an id argument. I was not imagining a real-time system, where the next query after a document is added would always include that document. Is that a requirement? That's harder. Not real-time, but it would be nice if we kept it close to what Lucene can currently provide. Most people seem fine with a latency of minutes. At this point I'm mostly trying to see if this functionality would meet the needs of Solr, Nutch and others. It depends on the project scope and how extensible things are. It seems like the master would be a WAR, capable of running stand- alone. What about index servers (slaves)? Would this project include just the interfaces to be implemented by Solr/Nutch nodes, some common implementation code behind the interfaces in the form of a library, or also complete standalone WARs? I'd need to be able to extend the ClientToSlave protocol to add additional methods for Solr (for passing in extra parameters and returning various extra data such as facets, highlighting, etc). Must we include a notion of document identity and/or document version in the mechanism? Would that facillitate updates and coherency? It doesn't need to be in the interfaces I don't think, so it depends on the scope of the index server implementations. -Yonik ~~~ 101tec Inc. search tech for web 2.1 Menlo Park, California http://www.101tec.com