Re: [jug-discussion] Searching large object graphs
Could be fun, I have some free time over the next week and need to get back into the programming game. And more importantly I think I could learn a bunch from other peoples code :) On Sun, 2 Jan 2005, Erik Hatcher wrote: > > On Jan 2, 2005, at 9:18 PM, Tim Colson wrote: > > >> I'm in if Tim wants > >> to write a few unit tests that candidate > >> implementations should turn green. > > With the holidays and new puppy, I haven't been responding to email as > > quickly... > > > > So Erik/Bryan -- are you gents saying if I code up some dummy objects, > > and > > then some junit tests with pseudo queries like: "all objects with a > > name or > > skill containing java", then you gents would code up some in-memory > > searches? > > That's what I'm saying! :) > > I can't promise quick turn-around time it'd depend on how tricky > you made your tests. But the "all objects with a name or skill > containing java" one shouldn't take long (on the order of minutes to > code given a setUp and testXXX method that already got me the data and > expectations). > > Maybe something like this: > > private ObjectManager om; > public void setUp() { >// create a Collection of objects > om = new ObjectManager(); // this will be the class I implement - > you could mock it to get the test to compile > om.add(collection); > } > > public void testFindJava() { > Collection results = om.findNameOrSkillContaining("java"); // how > do we phrase the query generically? > // with a Lucene implementation you could do "name:java OR > skill:java" > > // assert whatever you like about the returned objects - should they > be in any particular order? > } > > Erik > > > > > I could be game for that. > > > > Tim > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jug-discussion] Searching large object graphs
On Jan 2, 2005, at 9:18 PM, Tim Colson wrote: I'm in if Tim wants to write a few unit tests that candidate implementations should turn green. With the holidays and new puppy, I haven't been responding to email as quickly... So Erik/Bryan -- are you gents saying if I code up some dummy objects, and then some junit tests with pseudo queries like: "all objects with a name or skill containing java", then you gents would code up some in-memory searches? That's what I'm saying! :) I can't promise quick turn-around time it'd depend on how tricky you made your tests. But the "all objects with a name or skill containing java" one shouldn't take long (on the order of minutes to code given a setUp and testXXX method that already got me the data and expectations). Maybe something like this: private ObjectManager om; public void setUp() { // create a Collection of objects om = new ObjectManager(); // this will be the class I implement - you could mock it to get the test to compile om.add(collection); } public void testFindJava() { Collection results = om.findNameOrSkillContaining("java"); // how do we phrase the query generically? // with a Lucene implementation you could do "name:java OR skill:java" // assert whatever you like about the returned objects - should they be in any particular order? } Erik I could be game for that. Tim - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [jug-discussion] Searching large object graphs
> I'm in if Tim wants > to write a few unit tests that candidate > implementations should turn green. With the holidays and new puppy, I haven't been responding to email as quickly... So Erik/Bryan -- are you gents saying if I code up some dummy objects, and then some junit tests with pseudo queries like: "all objects with a name or skill containing java", then you gents would code up some in-memory searches? I could be game for that. Tim - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jug-discussion] Searching large object graphs
Ahh, looking back on it I did read it as a more general problem (their I go again reading way to much into things) Yes, I will agree your suggestion is very good one On Thu, 30 Dec 2004, Erik Hatcher wrote: > > On Dec 30, 2004, at 11:58 AM, [EMAIL PROTECTED] wrote: > > > O... Ok, that seems like fun (I know I am sick, but truth is I > > have time > > to kill at home for next week and a half) But we should also have > > different > > kinds of common data, like a few hundred complete personal records, a > > few > > books/blogs, etc. We could also see a difference between memory > > resident ODB > > structure and RDB structure. For implementation time we should also > > try one > > technology we are familiar with and one we are not; as implementation > > time is > > inversely proportional to prior knowledge of the method used to > > implement. Perhaps I can get more practice at Lucene. > > You're getting pretty carried away here! I am after simplicity - > meeting what Tim's original question was about, nothing more. From > what you just said, and what you say later, it sounds like you're > expanding the requirements dramatically. I'm in if Tim wants to write > a few unit tests that candidate implementations should turn green. > > > Also, am I the only one who has to deal with the Trak Everything > > Objects? I > > ask because a few hundred tuples in a record is not uncommon. It is > > also not > > uncommon to have them related to a few dozen other entities each of > > which may > > have 25-50 tuples. And the users come up with wacky searches like "I > > want to > > know every person who has ever been on a south phoenix construction > > project > > with Tim after he became a lead. " I know there are some scary smart > > people > > on this list (I am not necessarily on of them) and I would love to see > > some > > good code. > > This vastly changes the landscape. This sounds like the job for an RDF > engine (Kowari is the one I hear the most about). > > I'm not interested in building a mega catch-all kinda in-memory object > store. Tim had one concrete example, and I said Lucene looked perfect > for it. Lucene is awesome, but its not the end solution for every > conceivable scenario. If Tim's use cases are along the lines of the > example he provided then I'm up for making whatever unit tests he comes > up with pass with a Lucene implementation under the covers. > > Erik > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jug-discussion] Searching large object graphs
I must also agree. I will create another example, let us say you need to get from point A to point B a mile away. Is it better to walk or drive. Well drive of course. But if you do not have a car, or know how to drive, and can not wait for a cab then your stuck walking. It takes ME less time to build a good enough solution then it doze to learn how to build a best solution. Particularly if the issue is complex and must be understood by many people who will need to modify and maintain the code latter. That and I have this really obnoxious drive to keep my code as pure J2EE as possible. But by the same token I will use some black box stuff to save time (Reisn), but I hate to do it. >From a business side if I give a presentation that says I can give you a search engine that will return a very good result in 45 seconds for $500.00 or I can give you something that will return all results in two seconds and display them in a categorized fashion for $5000.00, they will usually choose the first solution. So I am not saying other technologies are bad I am just saying I prefer to hand code searches for the large OODB I deal with. And there is no way I could beat Lucene, I just could implement something faster by hand then to lean and implement Lucene. Now if I had to do a fresh implementation more often, or had need for its power, then I would make the investment. However, for the occasional search though a few, large, memory resident objects I could do it with an acceptable speed variance. Also Richard said he did not have time to read my email, so I find it odd that he would have time to learn a whole new tech? Their by going back to my J2EE code by hand thing being easier But then again I like hand coding J2EE and I wish I could learn more about really hard core system stuff like what else can I do with the robot, or how can I get system idle times, or capture a desktop, etc. But so far all I have found are J-C++ black box hacks that are seriously system dependent and I do not care to mess with that (Again not say Lucene or any other for mention tech is that way, just to show where I am coming from and that I like hand coding). And since I some how feel this is becoming a flame (could just bee the cold fogy morning that makes me feel that way) I will back off, as my suggestion does not seem to have been taken as some words from the devil. SO I apologize, If I misunderstood the original question and passed a solution that you all think is really off the wall. Which is why I offered a really short "Hey, I think I have this wrong, so here is the short version of what I feel answer". Well back to work :) Accounting not programming :) I shall keep my novice opinons top my self for a while :) On Thu, 30 Dec 2004, Drew Davidson wrote: > Richard Hightower wrote: > > >I agree. But what best are you talking about. The best technical solution or > >the best business solution. The best business solution is not always the > >best technical solution. > > > >(Mounting high horse...) Engineering is about tradeoffs: budget, time, > >beer... Actually I just threw beer in there for fun. > > > >I will continue to focus on good enough technical solution to fit the > >customers need. > > > >Actually, I will continue to play with technology that I am interested in > >and telling the customer it is the best business solution (just kidding). I > >will bile all technology I don't understand (if I don't understand it... How > >can it be good?) Sorry I was channeling the bile blog :) > > > > > I interpret "best" to mean the most comprehensive, extensible solution > possible. "Good" is therefore something that works reasonably well for > the purpose to which you are putting it. Simple as possible but no simpler. > > - Drew > > -- > +-+ > < Drew Davidson | OGNL Technology > > +-+ > | Email: [EMAIL PROTECTED] / > |Web: http://www.ognl.org / > |Vox: (520) 531-1966 < > |Fax: (520) 531-1965\ > | Mobile: (520) 405-2967 \ > +-+ > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jug-discussion] Searching large object graphs
On Dec 30, 2004, at 11:58 AM, [EMAIL PROTECTED] wrote: O... Ok, that seems like fun (I know I am sick, but truth is I have time to kill at home for next week and a half) But we should also have different kinds of common data, like a few hundred complete personal records, a few books/blogs, etc. We could also see a difference between memory resident ODB structure and RDB structure. For implementation time we should also try one technology we are familiar with and one we are not; as implementation time is inversely proportional to prior knowledge of the method used to implement. Perhaps I can get more practice at Lucene. You're getting pretty carried away here! I am after simplicity - meeting what Tim's original question was about, nothing more. From what you just said, and what you say later, it sounds like you're expanding the requirements dramatically. I'm in if Tim wants to write a few unit tests that candidate implementations should turn green. Also, am I the only one who has to deal with the Trak Everything Objects? I ask because a few hundred tuples in a record is not uncommon. It is also not uncommon to have them related to a few dozen other entities each of which may have 25-50 tuples. And the users come up with wacky searches like "I want to know every person who has ever been on a south phoenix construction project with Tim after he became a lead. " I know there are some scary smart people on this list (I am not necessarily on of them) and I would love to see some good code. This vastly changes the landscape. This sounds like the job for an RDF engine (Kowari is the one I hear the most about). I'm not interested in building a mega catch-all kinda in-memory object store. Tim had one concrete example, and I said Lucene looked perfect for it. Lucene is awesome, but its not the end solution for every conceivable scenario. If Tim's use cases are along the lines of the example he provided then I'm up for making whatever unit tests he comes up with pass with a Lucene implementation under the covers. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jug-discussion] Searching large object graphs
O... Ok, that seems like fun (I know I am sick, but truth is I have time to kill at home for next week and a half) But we should also have different kinds of common data, like a few hundred complete personal records, a few books/blogs, etc. We could also see a difference between memory resident ODB structure and RDB structure. For implementation time we should also try one technology we are familiar with and one we are not; as implementation time is inversely proportional to prior knowledge of the method used to implement. Perhaps I can get more practice at Lucene. If we get a half dozen of us and share our code it could be quite the learning experience as we find out what methods are comparable, what searches methods are best on what kind of data, is their a good multiple use tech that performs reasonably well on many kinds etc. :) I'm down :) Also, am I the only one who has to deal with the Trak Everything Objects? I ask because a few hundred tuples in a record is not uncommon. It is also not uncommon to have them related to a few dozen other entities each of which may have 25-50 tuples. And the users come up with wacky searches like "I want to know every person who has ever been on a south phoenix construction project with Tim after he became a lead. " I know there are some scary smart people on this list (I am not necessarily on of them) and I would love to see some good code. On Wed, 29 Dec 2004, Erik Hatcher wrote: > > On Dec 29, 2004, at 5:06 PM, [EMAIL PROTECTED] wrote: > > 3) Lucene is a very good system IF you have the kind of loose data it > > is coded > > for. However if you have tight objects the overhead it spends in > > organizing > > its search is wasted. So, if you 100K object is, say, a book with a > > half > > dozen attributes all containing similar data types, then you fine. If > > however > > your 100k object is a development project with 250 attributes of mixed > > data > > types, then it is not so good. > > Structured vs. unstructured searching is a very interesting topic. > XQuery is well worth consideration here. > > I've found in the work I do that folks talk about true structured > search, but when it comes to designing a search interface it becomes > vastly more complex for users to comprehend how to formulate XPath or > XQuery-like queries when what they really want to do is type in a > couple of words and have the software present them with the best > matches first. There is no doubt that indexing 250 different fields in > Lucene is way extreme and not how it should be used. But again, this > is generally not what folks *really* want. > > For the example that Tim provided, I still biasedly recommend Lucene :) > > In fact, should we have a head-to-head competition to implement > different techniques? We could rate each implementation on 1) How long > it took to implement 2) How fast the searches are. All we need is Tim > to write up some unit tests that we can each work on making pass, > including some JUnitPerf tests. I'm game. > > Erik > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jug-discussion] Searching large object graphs
Rick and Drew have it right, a long time ago in a galaxy far away, I attended the Xerox Professional selling course and the one thing important I learned was that people make the buying decision on three things, (1) the first solution they find for a problem, (2) the lowest percieved risk solution to their problem or (3) the “best” solution to their problem and in that order. The higher the cost the higher the percieved risk, the newer the higher the percieved risk, etc. If you pull a lot of doors and have low prices you win a lot of business and best doesn't matter. Ollie -Original Message- From: "Richard Hightower" <[EMAIL PROTECTED]> Date: Wed, 29 Dec 2004 20:48:50 To: Subject: RE: [jug-discussion] Searching large object graphs I agree. But what best are you talking about. The best technical solution or the best business solution. The best business solution is not always the best technical solution. (Mounting high horse...) Engineering is about tradeoffs: budget, time, beer... Actually I just threw beer in there for fun. I will continue to focus on good enough technical solution to fit the customers need. Actually, I will continue to play with technology that I am interested in and telling the customer it is the best business solution (just kidding). I will bile all technology I don't understand (if I don't understand it... How can it be good?) Sorry I was channeling the bile blog :) BTW Did I mention that OGNL Rocks?! DREW ROCKS! I got to get back to work! Later. On a lighter note my wireless keyboard and mouse went south I got the new Microsoft one with all of the bells and whistles. It works, and my keyboard has 25 extra keys Oh well... It won't make me type faster or procrastinate any less. -Original Message- From: Drew Davidson [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 29, 2004 10:23 PM To: jug-discussion@tucson-jug.org Subject: Re: [jug-discussion] Searching large object graphs Richard Hightower wrote: >I agree with Erik. I don't have time to read your long email let alone >implement a full-text search engine. I can't think of a single client >that would rather have me beat my laptop with a rock, then rent a >pneumatic hammer and destroy it in several efficient seconds. > > > The best is the enemy of the good. Words to live by in contracting. >On a lighter note I just learned all about DocBook. And More >importantly, I've got my wireless signal going all the way to my >mobile-mini office. > >Belkin Pre-N Wireless Router covers my whole 5 acre lot with a strong >signal with a lot of bandwidth. My laptop can pick up a signal on the >complete 5 acres with its new Pre-N Wireless NIC. Belkin rocks Linksys stinks. > > On a related note, Rick is now in the process of growing a second head because of the increased signal strength. - Drew -- +-+ < Drew Davidson | OGNL Technology > +-+ | Email: [EMAIL PROTECTED] / |Web: http://www.ognl.org / |Vox: (520) 531-1966 < |Fax: (520) 531-1965\ | Mobile: (520) 405-2967 \ +-+ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Mike Oliver CTO, Alarius Systems LLC Las Vegas, Nevada USA Sent using my BlackBerry 6510 from Nextel
Re: [jug-discussion] Searching large object graphs
Richard Hightower wrote: I agree. But what best are you talking about. The best technical solution or the best business solution. The best business solution is not always the best technical solution. (Mounting high horse...) Engineering is about tradeoffs: budget, time, beer... Actually I just threw beer in there for fun. I will continue to focus on good enough technical solution to fit the customers need. Actually, I will continue to play with technology that I am interested in and telling the customer it is the best business solution (just kidding). I will bile all technology I don't understand (if I don't understand it... How can it be good?) Sorry I was channeling the bile blog :) I interpret "best" to mean the most comprehensive, extensible solution possible. "Good" is therefore something that works reasonably well for the purpose to which you are putting it. Simple as possible but no simpler. - Drew -- +-+ < Drew Davidson | OGNL Technology > +-+ | Email: [EMAIL PROTECTED] / |Web: http://www.ognl.org / |Vox: (520) 531-1966 < |Fax: (520) 531-1965\ | Mobile: (520) 405-2967 \ +-+ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [jug-discussion] Searching large object graphs
"The best is the enemy of the good. " U... E I just realized that you were agreeing with me. Scratch almost everything I said DOH! I am lezdexic. -Original Message- From: Drew Davidson [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 29, 2004 10:23 PM To: jug-discussion@tucson-jug.org Subject: Re: [jug-discussion] Searching large object graphs Richard Hightower wrote: >I agree with Erik. I don't have time to read your long email let alone >implement a full-text search engine. I can't think of a single client >that would rather have me beat my laptop with a rock, then rent a >pneumatic hammer and destroy it in several efficient seconds. > > > The best is the enemy of the good. Words to live by in contracting. >On a lighter note I just learned all about DocBook. And More >importantly, I've got my wireless signal going all the way to my >mobile-mini office. > >Belkin Pre-N Wireless Router covers my whole 5 acre lot with a strong >signal with a lot of bandwidth. My laptop can pick up a signal on the >complete 5 acres with its new Pre-N Wireless NIC. Belkin rocks Linksys stinks. > > On a related note, Rick is now in the process of growing a second head because of the increased signal strength. - Drew -- +-+ < Drew Davidson | OGNL Technology > +-+ | Email: [EMAIL PROTECTED] / |Web: http://www.ognl.org / |Vox: (520) 531-1966 < |Fax: (520) 531-1965\ | Mobile: (520) 405-2967 \ +-+ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [jug-discussion] Searching large object graphs
I agree. But what best are you talking about. The best technical solution or the best business solution. The best business solution is not always the best technical solution. (Mounting high horse...) Engineering is about tradeoffs: budget, time, beer... Actually I just threw beer in there for fun. I will continue to focus on good enough technical solution to fit the customers need. Actually, I will continue to play with technology that I am interested in and telling the customer it is the best business solution (just kidding). I will bile all technology I don't understand (if I don't understand it... How can it be good?) Sorry I was channeling the bile blog :) BTW Did I mention that OGNL Rocks?! DREW ROCKS! I got to get back to work! Later. On a lighter note my wireless keyboard and mouse went south I got the new Microsoft one with all of the bells and whistles. It works, and my keyboard has 25 extra keys Oh well... It won't make me type faster or procrastinate any less. -Original Message- From: Drew Davidson [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 29, 2004 10:23 PM To: jug-discussion@tucson-jug.org Subject: Re: [jug-discussion] Searching large object graphs Richard Hightower wrote: >I agree with Erik. I don't have time to read your long email let alone >implement a full-text search engine. I can't think of a single client >that would rather have me beat my laptop with a rock, then rent a >pneumatic hammer and destroy it in several efficient seconds. > > > The best is the enemy of the good. Words to live by in contracting. >On a lighter note I just learned all about DocBook. And More >importantly, I've got my wireless signal going all the way to my >mobile-mini office. > >Belkin Pre-N Wireless Router covers my whole 5 acre lot with a strong >signal with a lot of bandwidth. My laptop can pick up a signal on the >complete 5 acres with its new Pre-N Wireless NIC. Belkin rocks Linksys stinks. > > On a related note, Rick is now in the process of growing a second head because of the increased signal strength. - Drew -- +-+ < Drew Davidson | OGNL Technology > +-+ | Email: [EMAIL PROTECTED] / |Web: http://www.ognl.org / |Vox: (520) 531-1966 < |Fax: (520) 531-1965\ | Mobile: (520) 405-2967 \ +-+ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jug-discussion] Searching large object graphs
Richard Hightower wrote: I agree with Erik. I don't have time to read your long email let alone implement a full-text search engine. I can't think of a single client that would rather have me beat my laptop with a rock, then rent a pneumatic hammer and destroy it in several efficient seconds. The best is the enemy of the good. Words to live by in contracting. On a lighter note I just learned all about DocBook. And More importantly, I've got my wireless signal going all the way to my mobile-mini office. Belkin Pre-N Wireless Router covers my whole 5 acre lot with a strong signal with a lot of bandwidth. My laptop can pick up a signal on the complete 5 acres with its new Pre-N Wireless NIC. Belkin rocks Linksys stinks. On a related note, Rick is now in the process of growing a second head because of the increased signal strength. - Drew -- +-+ < Drew Davidson | OGNL Technology > +-+ | Email: [EMAIL PROTECTED] / |Web: http://www.ognl.org / |Vox: (520) 531-1966 < |Fax: (520) 531-1965\ | Mobile: (520) 405-2967 \ +-+ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [jug-discussion] Searching large object graphs
Re: "It seems easier to re-invent a full-text search engine? I'd be way impressed if you could beat Lucene!" I agree with Erik. I don't have time to read your long email let alone implement a full-text search engine. I can't think of a single client that would rather have me beat my laptop with a rock, then rent a pneumatic hammer and destroy it in several efficient seconds. On a lighter note I just learned all about DocBook. And More importantly, I've got my wireless signal going all the way to my mobile-mini office. Belkin Pre-N Wireless Router covers my whole 5 acre lot with a strong signal with a lot of bandwidth. My laptop can pick up a signal on the complete 5 acres with its new Pre-N Wireless NIC. Belkin rocks Linksys stinks. I just remembered that Cisco is a client of mine... Hmmm Linksys is not as good. I am sure it will be better in the next release How is that? -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 29, 2004 8:41 PM To: jug-discussion@tucson-jug.org Subject: Re: [jug-discussion] Searching large object graphs On Dec 29, 2004, at 3:12 PM, [EMAIL PROTECTED] wrote: > Not to be Trite... But why not just use bean objects to a backend DB. > Or for > that matter hand write the old incremental sort and sorted search > routines. If it is all in memory then you should be able hand write > an index system capable of running through thousands of records in a > fraction of a second... Just seems easier... It seems easier to re-invent a full-text search engine? I'd be way impressed if you could beat Lucene! Given the example query Tim provided, you'd be able to do this using Lucene in only a handful of lines of code. Erik > On Thu, 23 Dec 2004, Erik Hatcher wrote: > >> Lucene >> >> The query would be this "name:olson OR email:olson" if you indexed >> that information into separate fields. A common technique is to >> index all data you want queryable also into an aggregate field in >> which case the query could simply be "olson". >> >> The full source code to Lucene in Action is at >> http://www.manning.com/hatcher2 - the ebook is available. The >> physical book is shipping from the printers as we speak (UPS tracking >> says I should have gotten my batch yesterday, but it'll be today it >> seems). >> http://www.lucenebook.com will go live within the week searching >> *inside* the book as well as a blog system I'm setting up. >> >> Erik >> >> On Dec 22, 2004, at 10:27 PM, Tim Colson wrote: >> >>> So just assume for a moment that RAM is cheap and you decided to >>> load 100K objects into memory. Assume those objects were >>> "Employees"... you can imagine the fields would be the usual >>> suspects. Assume each employee is associated with a profile that is >>> another object, which is composed of a bunch of other data objects. >>> >>> What would you use to find/select objects like "Name or email foo >>> matches >>> *olson* " ? >>> >>> Some possibilities: >>> http://jakarta.apache.org/commons/jxpath/ >>> >>> Some of the stuff inside Commons: >>> http://jakarta.apache.org/commons/collections/ >>> >>> Lucene indexes >>> http://jakarta.apache.org/lucene/docs/ >>> >>> >>> Others? >>> >>> Tim >>> >>> >>> - To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> - >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jug-discussion] Searching large object graphs
On Dec 29, 2004, at 5:06 PM, [EMAIL PROTECTED] wrote: 3) Lucene is a very good system IF you have the kind of loose data it is coded for. However if you have tight objects the overhead it spends in organizing its search is wasted. So, if you 100K object is, say, a book with a half dozen attributes all containing similar data types, then you fine. If however your 100k object is a development project with 250 attributes of mixed data types, then it is not so good. Structured vs. unstructured searching is a very interesting topic. XQuery is well worth consideration here. I've found in the work I do that folks talk about true structured search, but when it comes to designing a search interface it becomes vastly more complex for users to comprehend how to formulate XPath or XQuery-like queries when what they really want to do is type in a couple of words and have the software present them with the best matches first. There is no doubt that indexing 250 different fields in Lucene is way extreme and not how it should be used. But again, this is generally not what folks *really* want. For the example that Tim provided, I still biasedly recommend Lucene :) In fact, should we have a head-to-head competition to implement different techniques? We could rate each implementation on 1) How long it took to implement 2) How fast the searches are. All we need is Tim to write up some unit tests that we can each work on making pass, including some JUnitPerf tests. I'm game. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jug-discussion] Searching large object graphs
On Dec 29, 2004, at 3:12 PM, [EMAIL PROTECTED] wrote: Not to be Trite... But why not just use bean objects to a backend DB. Or for that matter hand write the old incremental sort and sorted search routines. If it is all in memory then you should be able hand write an index system capable of running through thousands of records in a fraction of a second... Just seems easier... It seems easier to re-invent a full-text search engine? I'd be way impressed if you could beat Lucene! Given the example query Tim provided, you'd be able to do this using Lucene in only a handful of lines of code. Erik On Thu, 23 Dec 2004, Erik Hatcher wrote: Lucene The query would be this "name:olson OR email:olson" if you indexed that information into separate fields. A common technique is to index all data you want queryable also into an aggregate field in which case the query could simply be "olson". The full source code to Lucene in Action is at http://www.manning.com/hatcher2 - the ebook is available. The physical book is shipping from the printers as we speak (UPS tracking says I should have gotten my batch yesterday, but it'll be today it seems). http://www.lucenebook.com will go live within the week searching *inside* the book as well as a blog system I'm setting up. Erik On Dec 22, 2004, at 10:27 PM, Tim Colson wrote: So just assume for a moment that RAM is cheap and you decided to load 100K objects into memory. Assume those objects were "Employees"... you can imagine the fields would be the usual suspects. Assume each employee is associated with a profile that is another object, which is composed of a bunch of other data objects. What would you use to find/select objects like "Name or email foo matches *olson* " ? Some possibilities: http://jakarta.apache.org/commons/jxpath/ Some of the stuff inside Commons: http://jakarta.apache.org/commons/collections/ Lucene indexes http://jakarta.apache.org/lucene/docs/ Others? Tim - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jug-discussion] Searching large object graphs
Yha I need to write my own meta data to the files and then retrieve it to perform searches. Thanks for the info I will check out your source and the geocity site this weekend and go from their. If I figure something I will post the answer :) Yha! Searchable image archives for every one! On Wed, 29 Dec 2004, Drew Davidson wrote: > [EMAIL PROTECTED] wrote: > > >While we are on the subject, I am looking for a more standard way of > >incorporate meta data into a JPG (Currently I do a preparatory insert in the > >JPG code and do a search on the whole code using tricks similar to those in > >Lucene, however this is hardly ideal) > >Any Jpeg people out their? > > > > > JPEG has EXIF metadata for storing boatloads of information about the > image; I believe (not exactly sure, though) that you can put your own > custom information in there. Is that what you are asking? > > Java library for extraction of standard metadata: > > http://www.drewnoakes.com/code/exif/ > > As far as writing the JPEG out with metadata, I don't have any resources > right off the top of my head. > > A good reference for stuff about JPEG: > > http://www.geocities.com/marcoschmidt.geo/jpeg-image-file-format.html > > And, of course an excellent reference to everything you ever wanted to > know about JPEG, EXIF metadata and the like: > > http://www.google.com > > :-) > > - Drew > > -- > +-+ > < Drew Davidson | OGNL Technology > > +-+ > | Email: [EMAIL PROTECTED] / > |Web: http://www.ognl.org / > |Vox: (520) 531-1966 < > |Fax: (520) 531-1965\ > | Mobile: (520) 405-2967 \ > +-+ > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jug-discussion] Searching large object graphs
[EMAIL PROTECTED] wrote: While we are on the subject, I am looking for a more standard way of incorporate meta data into a JPG (Currently I do a preparatory insert in the JPG code and do a search on the whole code using tricks similar to those in Lucene, however this is hardly ideal) Any Jpeg people out their? JPEG has EXIF metadata for storing boatloads of information about the image; I believe (not exactly sure, though) that you can put your own custom information in there. Is that what you are asking? Java library for extraction of standard metadata: http://www.drewnoakes.com/code/exif/ As far as writing the JPEG out with metadata, I don't have any resources right off the top of my head. A good reference for stuff about JPEG: http://www.geocities.com/marcoschmidt.geo/jpeg-image-file-format.html And, of course an excellent reference to everything you ever wanted to know about JPEG, EXIF metadata and the like: http://www.google.com :-) - Drew -- +-+ < Drew Davidson | OGNL Technology > +-+ | Email: [EMAIL PROTECTED] / |Web: http://www.ognl.org / |Vox: (520) 531-1966 < |Fax: (520) 531-1965\ | Mobile: (520) 405-2967 \ +-+ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [jug-discussion] Searching large object graphs
Hmmm Ok, I like to pose hypothetical questions as well so no big. As for - Waste of time re inventing the wheel!- you do realize the Wheel it has been reinvented Many, many times to suite ever changing needs of the vehicles that use them rite :) If you don't believe be do a patient search ;) It is true my short answer did not provide enough detail to interest you, so I will attempt to explain more fully. 1) Multi object search criteria are hardly time consuming to write. Yes, even fully modular ones that allows multiple object searches. If your code exceeds a few thousand lines in total your not allowing Java to do the work. Just pull the meta data from you object and construct, same as you would do if you where trying to create an SQL query. I personally like to construct open vectors of anonymous objects and do simple condition checking to employment loose syntax. Thus not only can you use wild cards in the actual data search, but also on the portions of the objects being searched. This is a nice approach because the calling function needs to know only that some thing like what they want exists some ware in the object. But also alloys you to stream past most of the condition checking if you know more about the object. And it all can be implemented in a black box format so the next code down the line does not have to understand the "Wheel" in order to use it. 2) If the issue is one of few big objects requiring a fast search then you actually loose efficiency with many forms of indexing and B-Tree optimization of the objects and corresponding indexes is a huge waste of overhead. In which case I would again use one of a variety of simple sort searches. Which ones and how to implement them of course depends on how many objects you are looking at and how often the data changes (do they still teach this stuff in programming 101?). I have actually written a number of good sub systems for use in regional nodes. Like those used in very large state/national production oriented DBs. (If your wondering why, it was because bandwidth is still an issue and each region typically uses on the order of 95% regional level data, but every one still needs to be able to access every thing at all times without knowing where it originated) 3) Lucene is a very good system IF you have the kind of loose data it is coded for. However if you have tight objects the overhead it spends in organizing its search is wasted. So, if you 100K object is, say, a book with a half dozen attributes all containing similar data types, then you fine. If however your 100k object is a development project with 250 attributes of mixed data types, then it is not so good. This is why I suggested hand codeing for your intended purpose. This way such pertinent questions like overall object structure, relation dependencies, data types, size of each attribute, etc, can be taken into account. BTW I do this a lot as a freelance DB consultant. While we are on the subject, I am looking for a more standard way of incorporate meta data into a JPG (Currently I do a preparatory insert in the JPG code and do a search on the whole code using tricks similar to those in Lucene, however this is hardly ideal) Any Jpeg people out their? On Wed, 29 Dec 2004, Tim Colson wrote: > >But why not just use bean objects to a backend DB. > > Well, howabout because I explicitly posed the question as "just assume for a > moment that RAM is cheap and you decided to load 100K objects into memory" > instead of "I have a lot of data...what kind of thingy should I store it > in... oh, and please reply with small words because I am developmentally > challenged." > > Maybe that's why. ;-) > > > that matter hand write the old incremental sort and sorted search > > routines. > > Apparently I wasn't clear -- I want to search using multiple criteria with > wildcards/booleans on multiple fields, and on data in contained objects. > > Mostly I'd rather not waste time re-inventing wheels, and [usually] the > folks on the list provide interesting food for thought. > > I won't bother with the flame-bait about "overly complex" and airguns. > > Cheers, > Tim > > > > On Thu, 23 Dec 2004, Erik Hatcher wrote: > > > > > Lucene > > > > > > The query would be this "name:olson OR email:olson" if you > > indexed that > > > information into separate fields. A common technique is to > > index all > > > data you want queryable also into an aggregate field in > > which case the > > > query could simply be "olson". > > > > > > The full source code to Lucene in Action is at > > > http://www.manning.com/hatcher2 - the ebook is available. > > The physical > > > book is shipping from the printers as we speak (UPS tracking says I > > > should have gotten my batch yesterday, but it'll be today > > it seems). > > > http://www.lucenebook.com will go live within the week searching > > > *inside* the book as well as a blog system I'm setting up. > > > > > >
RE: [jug-discussion] Searching large object graphs
Or "Trust me..." Michael Oliver CTO Alarius Systems LLC 3325 N. Nellis Blvd, #1 Las Vegas, NV 89115 Phone:(702)643-7425 Fax:(520)844-1036 *Note new email changed from [EMAIL PROTECTED] -Original Message- From: Richard Hightower [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 29, 2004 11:28 AM To: jug-discussion@tucson-jug.org Subject: RE: [jug-discussion] Searching large object graphs Beware of any email that begin with the words "Not to be Trite...". You can feel a big wall of Trite flame coming around the corner. :) -Original Message- From: Tim Colson [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 29, 2004 4:15 PM To: jug-discussion@tucson-jug.org Subject: RE: [jug-discussion] Searching large object graphs >But why not just use bean objects to a backend DB. Well, howabout because I explicitly posed the question as "just assume for a moment that RAM is cheap and you decided to load 100K objects into memory" instead of "I have a lot of data...what kind of thingy should I store it in... oh, and please reply with small words because I am developmentally challenged." Maybe that's why. ;-) > that matter hand write the old incremental sort and sorted search > routines. Apparently I wasn't clear -- I want to search using multiple criteria with wildcards/booleans on multiple fields, and on data in contained objects. Mostly I'd rather not waste time re-inventing wheels, and [usually] the folks on the list provide interesting food for thought. I won't bother with the flame-bait about "overly complex" and airguns. Cheers, Tim > On Thu, 23 Dec 2004, Erik Hatcher wrote: > > > Lucene > > > > The query would be this "name:olson OR email:olson" if you > indexed that > > information into separate fields. A common technique is to > index all > > data you want queryable also into an aggregate field in > which case the > > query could simply be "olson". > > > > The full source code to Lucene in Action is at > > http://www.manning.com/hatcher2 - the ebook is available. > The physical > > book is shipping from the printers as we speak (UPS tracking says I > > should have gotten my batch yesterday, but it'll be today > it seems). > > http://www.lucenebook.com will go live within the week searching > > *inside* the book as well as a blog system I'm setting up. > > > > Erik > > > > On Dec 22, 2004, at 10:27 PM, Tim Colson wrote: > > > > > So just assume for a moment that RAM is cheap and you > decided to load > > > 100K > > > objects into memory. Assume those objects were > "Employees"... you can > > > imagine the fields would be the usual suspects. Assume > each employee is > > > associated with a profile that is another object, which > is composed of > > > a > > > bunch of other data objects. > > > > > > What would you use to find/select objects like "Name or email foo > > > matches > > > *olson* " ? > > > > > > Some possibilities: > > > http://jakarta.apache.org/commons/jxpath/ > > > > > > Some of the stuff inside Commons: > > > http://jakarta.apache.org/commons/collections/ > > > > > > Lucene indexes > > > http://jakarta.apache.org/lucene/docs/ > > > > > > > > > Others? > > > > > > Tim > > > > > > > - > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [jug-discussion] Searching large object graphs
Beware of any email that begin with the words "Not to be Trite...". You can feel a big wall of Trite flame coming around the corner. :) -Original Message- From: Tim Colson [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 29, 2004 4:15 PM To: jug-discussion@tucson-jug.org Subject: RE: [jug-discussion] Searching large object graphs >But why not just use bean objects to a backend DB. Well, howabout because I explicitly posed the question as "just assume for a moment that RAM is cheap and you decided to load 100K objects into memory" instead of "I have a lot of data...what kind of thingy should I store it in... oh, and please reply with small words because I am developmentally challenged." Maybe that's why. ;-) > that matter hand write the old incremental sort and sorted search > routines. Apparently I wasn't clear -- I want to search using multiple criteria with wildcards/booleans on multiple fields, and on data in contained objects. Mostly I'd rather not waste time re-inventing wheels, and [usually] the folks on the list provide interesting food for thought. I won't bother with the flame-bait about "overly complex" and airguns. Cheers, Tim > On Thu, 23 Dec 2004, Erik Hatcher wrote: > > > Lucene > > > > The query would be this "name:olson OR email:olson" if you > indexed that > > information into separate fields. A common technique is to > index all > > data you want queryable also into an aggregate field in > which case the > > query could simply be "olson". > > > > The full source code to Lucene in Action is at > > http://www.manning.com/hatcher2 - the ebook is available. > The physical > > book is shipping from the printers as we speak (UPS tracking says I > > should have gotten my batch yesterday, but it'll be today > it seems). > > http://www.lucenebook.com will go live within the week searching > > *inside* the book as well as a blog system I'm setting up. > > > > Erik > > > > On Dec 22, 2004, at 10:27 PM, Tim Colson wrote: > > > > > So just assume for a moment that RAM is cheap and you > decided to load > > > 100K > > > objects into memory. Assume those objects were > "Employees"... you can > > > imagine the fields would be the usual suspects. Assume > each employee is > > > associated with a profile that is another object, which > is composed of > > > a > > > bunch of other data objects. > > > > > > What would you use to find/select objects like "Name or email foo > > > matches > > > *olson* " ? > > > > > > Some possibilities: > > > http://jakarta.apache.org/commons/jxpath/ > > > > > > Some of the stuff inside Commons: > > > http://jakarta.apache.org/commons/collections/ > > > > > > Lucene indexes > > > http://jakarta.apache.org/lucene/docs/ > > > > > > > > > Others? > > > > > > Tim > > > > > > > - > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [jug-discussion] Searching large object graphs
Oh my, I must have heard differently, I knew you were challenged and it had something to do with small, but I was way off base...;-) Michael Oliver CTO Alarius Systems LLC 3325 N. Nellis Blvd, #1 Las Vegas, NV 89115 Phone:(702)643-7425 Fax:(520)844-1036 *Note new email changed from [EMAIL PROTECTED] -Original Message- From: Tim Colson [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 29, 2004 1:15 PM To: jug-discussion@tucson-jug.org Subject: RE: [jug-discussion] Searching large object graphs >But why not just use bean objects to a backend DB. Well, howabout because I explicitly posed the question as "just assume for a moment that RAM is cheap and you decided to load 100K objects into memory" instead of "I have a lot of data...what kind of thingy should I store it in... oh, and please reply with small words because I am developmentally challenged." Maybe that's why. ;-) > that matter hand write the old incremental sort and sorted search > routines. Apparently I wasn't clear -- I want to search using multiple criteria with wildcards/booleans on multiple fields, and on data in contained objects. Mostly I'd rather not waste time re-inventing wheels, and [usually] the folks on the list provide interesting food for thought. I won't bother with the flame-bait about "overly complex" and airguns. Cheers, Tim > On Thu, 23 Dec 2004, Erik Hatcher wrote: > > > Lucene > > > > The query would be this "name:olson OR email:olson" if you > indexed that > > information into separate fields. A common technique is to > index all > > data you want queryable also into an aggregate field in > which case the > > query could simply be "olson". > > > > The full source code to Lucene in Action is at > > http://www.manning.com/hatcher2 - the ebook is available. > The physical > > book is shipping from the printers as we speak (UPS tracking says I > > should have gotten my batch yesterday, but it'll be today > it seems). > > http://www.lucenebook.com will go live within the week searching > > *inside* the book as well as a blog system I'm setting up. > > > > Erik > > > > On Dec 22, 2004, at 10:27 PM, Tim Colson wrote: > > > > > So just assume for a moment that RAM is cheap and you > decided to load > > > 100K > > > objects into memory. Assume those objects were > "Employees"... you can > > > imagine the fields would be the usual suspects. Assume > each employee is > > > associated with a profile that is another object, which > is composed of > > > a > > > bunch of other data objects. > > > > > > What would you use to find/select objects like "Name or email foo > > > matches > > > *olson* " ? > > > > > > Some possibilities: > > > http://jakarta.apache.org/commons/jxpath/ > > > > > > Some of the stuff inside Commons: > > > http://jakarta.apache.org/commons/collections/ > > > > > > Lucene indexes > > > http://jakarta.apache.org/lucene/docs/ > > > > > > > > > Others? > > > > > > Tim > > > > > > > - > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [jug-discussion] Searching large object graphs
>But why not just use bean objects to a backend DB. Well, howabout because I explicitly posed the question as "just assume for a moment that RAM is cheap and you decided to load 100K objects into memory" instead of "I have a lot of data...what kind of thingy should I store it in... oh, and please reply with small words because I am developmentally challenged." Maybe that's why. ;-) > that matter hand write the old incremental sort and sorted search > routines. Apparently I wasn't clear -- I want to search using multiple criteria with wildcards/booleans on multiple fields, and on data in contained objects. Mostly I'd rather not waste time re-inventing wheels, and [usually] the folks on the list provide interesting food for thought. I won't bother with the flame-bait about "overly complex" and airguns. Cheers, Tim > On Thu, 23 Dec 2004, Erik Hatcher wrote: > > > Lucene > > > > The query would be this "name:olson OR email:olson" if you > indexed that > > information into separate fields. A common technique is to > index all > > data you want queryable also into an aggregate field in > which case the > > query could simply be "olson". > > > > The full source code to Lucene in Action is at > > http://www.manning.com/hatcher2 - the ebook is available. > The physical > > book is shipping from the printers as we speak (UPS tracking says I > > should have gotten my batch yesterday, but it'll be today > it seems). > > http://www.lucenebook.com will go live within the week searching > > *inside* the book as well as a blog system I'm setting up. > > > > Erik > > > > On Dec 22, 2004, at 10:27 PM, Tim Colson wrote: > > > > > So just assume for a moment that RAM is cheap and you > decided to load > > > 100K > > > objects into memory. Assume those objects were > "Employees"... you can > > > imagine the fields would be the usual suspects. Assume > each employee is > > > associated with a profile that is another object, which > is composed of > > > a > > > bunch of other data objects. > > > > > > What would you use to find/select objects like "Name or email foo > > > matches > > > *olson* " ? > > > > > > Some possibilities: > > > http://jakarta.apache.org/commons/jxpath/ > > > > > > Some of the stuff inside Commons: > > > http://jakarta.apache.org/commons/collections/ > > > > > > Lucene indexes > > > http://jakarta.apache.org/lucene/docs/ > > > > > > > > > Others? > > > > > > Tim > > > > > > > - > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [jug-discussion] Searching large object graphs
I keep hitting my thumb with the Rock. I guess that is better than severing my limb with the pneumatic hammer. Congrats on the book Erik. Lucene seems really cool. I hope to work with it on a future project. My limbs seem to grow back. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 29, 2004 3:12 PM To: jug-discussion@tucson-jug.org Subject: Re: [jug-discussion] Searching large object graphs Not to be Trite... But why not just use bean objects to a backend DB. Or for that matter hand write the old incremental sort and sorted search routines. If it is all in memory then you should be able hand write an index system capable of running through thousands of records in a fraction of a second... Just seems easier... but then again I am not a CSE, so I don't get a lot of joy out of using the overly complex to do the overly simply just so I can learn about the overly complex for no more reason then I may need it latter. Or more simply, for me it is easier to hammer one loose nail with a near by rock then to set up a pneumatic nail gun. On Thu, 23 Dec 2004, Erik Hatcher wrote: > Lucene > > The query would be this "name:olson OR email:olson" if you indexed > that information into separate fields. A common technique is to index > all data you want queryable also into an aggregate field in which case > the query could simply be "olson". > > The full source code to Lucene in Action is at > http://www.manning.com/hatcher2 - the ebook is available. The > physical book is shipping from the printers as we speak (UPS tracking > says I should have gotten my batch yesterday, but it'll be today it seems). > http://www.lucenebook.com will go live within the week searching > *inside* the book as well as a blog system I'm setting up. > > Erik > > On Dec 22, 2004, at 10:27 PM, Tim Colson wrote: > > > So just assume for a moment that RAM is cheap and you decided to > > load 100K objects into memory. Assume those objects were > > "Employees"... you can imagine the fields would be the usual > > suspects. Assume each employee is associated with a profile that is > > another object, which is composed of a bunch of other data objects. > > > > What would you use to find/select objects like "Name or email foo > > matches > > *olson* " ? > > > > Some possibilities: > > http://jakarta.apache.org/commons/jxpath/ > > > > Some of the stuff inside Commons: > > http://jakarta.apache.org/commons/collections/ > > > > Lucene indexes > > http://jakarta.apache.org/lucene/docs/ > > > > > > Others? > > > > Tim > > > > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jug-discussion] Searching large object graphs
Not to be Trite... But why not just use bean objects to a backend DB. Or for that matter hand write the old incremental sort and sorted search routines. If it is all in memory then you should be able hand write an index system capable of running through thousands of records in a fraction of a second... Just seems easier... but then again I am not a CSE, so I don't get a lot of joy out of using the overly complex to do the overly simply just so I can learn about the overly complex for no more reason then I may need it latter. Or more simply, for me it is easier to hammer one loose nail with a near by rock then to set up a pneumatic nail gun. On Thu, 23 Dec 2004, Erik Hatcher wrote: > Lucene > > The query would be this "name:olson OR email:olson" if you indexed that > information into separate fields. A common technique is to index all > data you want queryable also into an aggregate field in which case the > query could simply be "olson". > > The full source code to Lucene in Action is at > http://www.manning.com/hatcher2 - the ebook is available. The physical > book is shipping from the printers as we speak (UPS tracking says I > should have gotten my batch yesterday, but it'll be today it seems). > http://www.lucenebook.com will go live within the week searching > *inside* the book as well as a blog system I'm setting up. > > Erik > > On Dec 22, 2004, at 10:27 PM, Tim Colson wrote: > > > So just assume for a moment that RAM is cheap and you decided to load > > 100K > > objects into memory. Assume those objects were "Employees"... you can > > imagine the fields would be the usual suspects. Assume each employee is > > associated with a profile that is another object, which is composed of > > a > > bunch of other data objects. > > > > What would you use to find/select objects like "Name or email foo > > matches > > *olson* " ? > > > > Some possibilities: > > http://jakarta.apache.org/commons/jxpath/ > > > > Some of the stuff inside Commons: > > http://jakarta.apache.org/commons/collections/ > > > > Lucene indexes > > http://jakarta.apache.org/lucene/docs/ > > > > > > Others? > > > > Tim > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jug-discussion] JavaOne CALL FOR PAPERS, was RE: [jug-discussion] Searching large object graphs
Hey Erik et al. I am glad to hear your Lucene in Action book is going to the printers. I will order a copy ASAP. BTW JavaOne 2005 is doing a call for papers. I was thinking about signing up. You should think about it too. (The year I got accepted, I submitted 5 presentations, and they choose one b/c someone called in sick. The called me last minute. I spoke on XDoclet making EJB CMP/CMR easier. Shudder... Brrr...) I plan on being in town (Tucson) for the next six weeks or so (plans subject to change). I am writing some articles for IBM and starting a book for O'Rielly for my down time (Drew and I are working on it together). Sorry I missed you in VA. I wanted to get together the last week, but my schedule got crazy. When are you coming to Tucson? I better get to work. There is no persecution like staring at a blank page. BTW are there any Eclipse plugin/SWT experts in Tucson? that would not mind traveling a bit to LA -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, December 23, 2004 3:05 AM To: jug-discussion@tucson-jug.org Subject: Re: [jug-discussion] Searching large object graphs Lucene The query would be this "name:olson OR email:olson" if you indexed that information into separate fields. A common technique is to index all data you want queryable also into an aggregate field in which case the query could simply be "olson". The full source code to Lucene in Action is at http://www.manning.com/hatcher2 - the ebook is available. The physical book is shipping from the printers as we speak (UPS tracking says I should have gotten my batch yesterday, but it'll be today it seems). http://www.lucenebook.com will go live within the week searching *inside* the book as well as a blog system I'm setting up. Erik On Dec 22, 2004, at 10:27 PM, Tim Colson wrote: > So just assume for a moment that RAM is cheap and you decided to load > 100K objects into memory. Assume those objects were "Employees"... you > can imagine the fields would be the usual suspects. Assume each > employee is associated with a profile that is another object, which is > composed of a bunch of other data objects. > > What would you use to find/select objects like "Name or email foo > matches > *olson* " ? > > Some possibilities: > http://jakarta.apache.org/commons/jxpath/ > > Some of the stuff inside Commons: > http://jakarta.apache.org/commons/collections/ > > Lucene indexes > http://jakarta.apache.org/lucene/docs/ > > > Others? > > Tim > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jug-discussion] Searching large object graphs
Lucene The query would be this "name:olson OR email:olson" if you indexed that information into separate fields. A common technique is to index all data you want queryable also into an aggregate field in which case the query could simply be "olson". The full source code to Lucene in Action is at http://www.manning.com/hatcher2 - the ebook is available. The physical book is shipping from the printers as we speak (UPS tracking says I should have gotten my batch yesterday, but it'll be today it seems). http://www.lucenebook.com will go live within the week searching *inside* the book as well as a blog system I'm setting up. Erik On Dec 22, 2004, at 10:27 PM, Tim Colson wrote: So just assume for a moment that RAM is cheap and you decided to load 100K objects into memory. Assume those objects were "Employees"... you can imagine the fields would be the usual suspects. Assume each employee is associated with a profile that is another object, which is composed of a bunch of other data objects. What would you use to find/select objects like "Name or email foo matches *olson* " ? Some possibilities: http://jakarta.apache.org/commons/jxpath/ Some of the stuff inside Commons: http://jakarta.apache.org/commons/collections/ Lucene indexes http://jakarta.apache.org/lucene/docs/ Others? Tim - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jug-discussion] Searching large object graphs
OGNL Nick --- Tim Colson <[EMAIL PROTECTED]> wrote: > So just assume for a moment that RAM is cheap and you decided to load > 100K > objects into memory. Assume those objects were "Employees"... you can > imagine the fields would be the usual suspects. Assume each employee is > associated with a profile that is another object, which is composed of a > bunch of other data objects. > > What would you use to find/select objects like "Name or email foo matches > *olson* " ? > > Some possibilities: > http://jakarta.apache.org/commons/jxpath/ > > Some of the stuff inside Commons: > http://jakarta.apache.org/commons/collections/ > > Lucene indexes > http://jakarta.apache.org/lucene/docs/ > > > Others? > > Tim > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jug-discussion] Searching large object graphs
So just assume for a moment that RAM is cheap and you decided to load 100K objects into memory. Assume those objects were "Employees"... you can imagine the fields would be the usual suspects. Assume each employee is associated with a profile that is another object, which is composed of a bunch of other data objects. What would you use to find/select objects like "Name or email foo matches *olson* " ? Some possibilities: http://jakarta.apache.org/commons/jxpath/ Some of the stuff inside Commons: http://jakarta.apache.org/commons/collections/ Lucene indexes http://jakarta.apache.org/lucene/docs/ Others? Tim - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]