Re: Organizing the Lucene meetup (Was: ApacheCon US)

2009-10-19 Thread Stefan Groschupf

There is an initial schedule online at:
http://wiki.apache.org/lucene-java/LuceneAtApacheConUs2009
Isabel


I still plan to do the Katta introduction. Is someone officially  
maintain the page or should I just go ahead and remove the question  
mark myself?

Stefan



Re: [APACHECON] Planning

2009-06-17 Thread Stefan Groschupf

Hi Grant,
sorry I lost track here, is there a list of excepted presentations  
somewhere?

Stefan


~~~
Hadoop training and consulting
http://www.scaleunlimited.com
http://www.101tec.com



On Jun 17, 2009, at 8:42 AM, Grant Ingersoll wrote:

Note, you may not have permission to view that page.  Sorry.  Not my  
call.


Also note that it is _MY_ understanding that airfare is no longer  
covered as part of the speaker package.  Maybe others can confirm  
this.  I'm not sure how this effects people's willingness to speak,  
but it is a downer in my mind.  However, the ASF does have a Travel  
Assistance Committee that people can apply to for assistance.  I  
don't know the details of that.



On Jun 17, 2009, at 10:42 AM, Grant Ingersoll wrote:

OK, we've been alloted 2 days for Lucene:  http://wiki.apache.org/concom-planning/ParticipatingPmcs 
.  More later on info about the Calls for Presentations (CFPs)


Now we need to figure out what we are going to do.

Also, we need, asap, a description that satisfies:
snip
In order to get registration open ASAP, we need a promo-text for each
track. If you're copied on this email, I'll be nagging you for a text
for your project (listed below). If there's someone else I should nag
instead, please let me know. If you know who I should be nagging for
the last three tracks below, please let me know that too.

What should the promo text look like? We need 150-200 words,  
explaining

-what the track will cover (outline is fine, you don't need to have
abstracts and bios if you're not ready for that),
-who the intended audience is, and
-why people will want to attend/what they'll get out of it.

If you're planning something amazing, cool, new or exciting, we want
some information about that. Is there going to be a panel discussion
with some of the central project members telling people what to  
expect

in the widely-anticipated next release? How about a hands-on
masterclass with that really tricky part of the project that everyone
has trouble with? Or everything you need to know to decide which
technologies to use in which situations, and how to get the most out
of your limited resources?
/snip








ScaleCamp: get together the night before Hadoop Summit

2009-05-13 Thread Stefan Groschupf

Hi All,

We are planing a community event the night before the Hadoop Summit.
This BarCamp (http://en.wikipedia.org/wiki/BarCamp) event will be  
held at the same venue as the Summit (Santa Clara Marriott).

Refreshments will be served to encourage socializing.

To initialize conversations for the social part of the evening we are  
offering people the opportunity to present an experience report of  
their project (within a 15 min presentation).
We have 12 slots in 3 parallel tracks max. The focus should be on  
projects leveraging technologies from the Hadoop eco-system.


Please join us and mingle with the rest of the Hadoop community.

To find out more about this event and signup please visit :
http://www.scaleunlimited.com/events/scale_camp

Please submit your presentation here:
http://www.scaleunlimited.com/about-us/contact


Stefan
P.S. Please spread the word!
P.P.S Apologies for the cross posting.


Re: Lucene Performance and usage alternatives

2008-08-05 Thread Stefan Groschupf
An alternative is always to distribute the index to a set of servers.  
If you need to scale I guess this is the only long term perspective.
You can do your own home grown lucene distribution or look into  
existing one.
I'm currently working on katta (http://katta.wiki.sourceforge.net/) -  
there is no release yet but we are in the QA and test cycles.
But there are other as well - solar for example provides distribution  
as well.


Stefan


On Aug 5, 2008, at 7:21 AM, ezer wrote:



I just made a program using the java api of Lucene. Its is working  
fine for
my actually index size. But i am worried about performance with an  
biger

index and simultaneous users access.

1) I am worried with the fact of having to make the program in java. I
searched for alternative like the C Port, but i saw that the version  
used

its a little old an no much people seem to use that.

2) I also thinking in compiling the code with cgj to generate native  
code

and not use the jvm. Anybody tried it ? Can be an advantage that could
aproximate to the performance of a C program ?

3) I wont use an application server, i will call the program  
directly from a
php page, is there any architecture model suggested for doing that?  
I mean
for preview many users accessing to the program. The fact of  
initiating one

isntance each time someone do a query and opening the index should not
degrade the performance?
--
View this message in context: 
http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
Sent from the Lucene - General mailing list archive at Nabble.com.




~~~
101tec Inc.
Menlo Park, California, USA
http://www.101tec.com




Re: Lucene-based Distributed Index Leveraging Hadoop

2008-04-03 Thread Stefan Groschupf

Should we start from scratch or with a code contribution?
Someone still want to contribute its implementation?
I just noticed - to late though - Ning already contributed the code to  
hadoop. So I guess my question should be rephrased what is the idea of  
moving this into a own project?




Re: [PROPOSAL] index server project

2006-11-06 Thread Stefan Groschupf

Hi,

do people think we are already in a stage where we can setup some  
basic infrastructure like mailing list and wiki and move the  
discussion to the new mailing list. Maybe setup a incubator project?


I would be happy to help with such basic tasks.

Stefan



Am 31.10.2006 um 22:03 schrieb Yonik Seeley:


On 10/30/06, Doug Cutting [EMAIL PROTECTED] wrote:

Yonik Seeley wrote:
 On 10/18/06, Doug Cutting [EMAIL PROTECTED] wrote:
 We assume that, within an index, a file with a given name is  
written

 only once.

 Is this necessary, and will we need the lockless patch (that avoids
 renaming or rewriting *any* files), or is Lucene's current index
 behavior sufficient?

It's not strictly required, but it would make index synchronization a
lot simpler. Yes, I was assuming the lockless patch would be  
committed
to Lucene before this project gets very far.  Something more than  
that

would be required in order to keep old versions, but this could be as
simple as a Directory subclass that refuses to remove files for a  
time.


Or a snapshot (hard links) mechanism.
Lucene would also need a way to open a specific index version (rather
than just the latest), but I guess that could also be hacked into
Directory by hiding later segments files (assumes lockless is
committed).

 It's unfortunate the master needs to be involved on every  
document add.


That should not normally be the case.


Ahh... I had assumed that id in the following method was document  
id:

 IndexLocation getUpdateableIndex(String id);

I see now it's index id.

But what is index id exactly?  Looking at the example API you laid
down, it must be a single physical index (as opposed to a logical
index).  In which case, is it entirely up to the client to manage
multi-shard indicies?  For example, if we had a photo index broken
up into 3 shards, each shard would have a separate index id and it
would be up to the client to know this, and to query across the
different photo0, photo1, photo2 indicies.  The master would
have no clue those indicies were related.  Hmmm, that doesn't work
very well for deletes though.

It seems like there should be the concept of a logical index, that is
composed of multiple shards, and each shard has multiple copies.

Or were you thinking that a cluster would only contain a single
logical index, and hence all different index ids are simply different
shards of that single logical index?  That would seem to be consistent
with ClientToMasterProtocol .getSearchableIndexes() lacking an id
argument.


I was not imagining a real-time system, where the next query after a
document is added would always include that document.  Is that a
requirement?  That's harder.


Not real-time, but it would be nice if we kept it close to what Lucene
can currently provide.
Most people seem fine with a latency of minutes.

At this point I'm mostly trying to see if this functionality would  
meet

the needs of Solr, Nutch and others.



It depends on the project scope and how extensible things are.
It seems like the master would be a WAR, capable of running stand- 
alone.

What about index servers (slaves)?  Would this project include just
the interfaces to be implemented by Solr/Nutch nodes, some common
implementation code behind the interfaces in the form of a library, or
also complete standalone WARs?

I'd need to be able to extend the ClientToSlave protocol to add
additional methods for Solr (for passing in extra parameters and
returning various extra data such as facets, highlighting, etc).

Must we include a notion of document identity and/or document  
version in

the mechanism? Would that facillitate updates and coherency?


It doesn't need to be in the interfaces I don't think, so it depends
on the scope of the index server implementations.

-Yonik



~~~
101tec Inc.
search tech for web 2.1
Menlo Park, California
http://www.101tec.com