Re: Realtime Search for Social Networks Collaboration
On Sat, Sep 6, 2008 at 1:36 AM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Regarding real-time search and Solr, my feeling is the focus should be on > first adding real-time search to Lucene, and then we'll figure out how to > incorporate that into Solr later. Otis, what do you mean exactly by "adding real-time search to Lucene"? Note that Lucene, being a indexing/search library (and not a full blown search engine), is by definition "real-time": once you add/write a document to the index it becomes immediately searchable and if a document is logically deleted and no longer returned in a search, though physical deletion happens during an index optimization. Now, the problem of adding/deleting documents in bulk, as part of a transaction and making these documents available for search immediately after the transaction is commited sounds more like a search engine problem (i.e. SOLR, Nutch, Ocean), specially if these transactions are known to be I/O expensive and thus are usually implemented bached proceeses with some kind of sync mechanism, which makes them non real-time. For example, in my previous life, I designed and help implement a quasi-realtime enterprise search engine using Lucene, having a set of multi-threaded indexers hitting a set of multiple indexes alocatted accross different search services which powered a broker based distributed search interface. The most recent documents provided to the indexers were always added to the smaller in-memory (RAM) indexes which usually could absorbe the load of a bulk "add" transaction and later would be merged into larger disk based indexes and then flushed to make them ready to absorbe new fresh docs. We even had further partitioning of the indexes that reflected time periods with caps on size for them to be merged into older more archive based indexes which were used less (yes the search engine default search was on data no more than 1 month old, though user could open the time window by including archives). As for SOLR and OCEAN, I would argue that these semi-structured search engines are becomming more and more like relational databases with full-text search capablities (without the benefit of full reletional algebra -- for example joins are not possible using SOLR). Notice that "real-time" CRUD operations and transactionality are core DB concepts adn have been studied and developed by database communities for aquite long time. There has been recent efforts on how to effeciently integrate Lucene into releational databases (see Lucene JVM ORACLE integration, see http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html ) I think we should seriously look at joining efforts with open-source Database engine projects, written in Java (see http://java-source.net/open-source/database-engines) in order to blend IR and ORM for once and for all. -- Joaquin > > > I've read Jason's Wiki as well. Actually, I had to read it a number of > times to understand bits and pieces of it. I have to admit there is still > some fuzziness about the whole things in my head - is "Ocean" something that > already works, a separate project on googlecode.com? I think so. If so, > and if you are working on getting it integrated into Lucene, would it make > it less confusing to just refer to it as "real-time search", so there is no > confusion? > > If this is to be initially integrated into Lucene, why are things like > replication, crowding/field collapsing, locallucene, name service, tag > index, etc. all mentioned there on the Wiki and bundled with description of > how real-time search works and is to be implemented? I suppose mentioning > replication kind-of makes sense because the replication approach is closely > tied to real-time search - all query nodes need to see index changes fast. > But Lucene itself offers no replication mechanism, so maybe the replication > is something to figure out separately, say on the Solr level, later on "once > we get there". I think even just the essential real-time search requires > substantial changes to Lucene (I remember seeing large patches in JIRA), > which makes it hard to digest, understand, comment on, and ultimately commit > (hence the luke warm response, I think). Bringing other non-essential > elements into discussion at the same time makes it more difficult to > process all this new stuff, at least for me. Am I the only one who finds > this hard? > > That said, it sounds like we have some discussion going (Karl...), so I > look forward to understanding more! :) > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message > > From: Yonik Seeley <[EMAIL PROTECTED]> > > To: java-dev@lucene.apache.org > > Sent: Thursday, September 4, 2008 10:13:32 AM > > Subject: Re: Realtime Search for Social Networks Collaboration > > > > On Wed, Sep 3, 2008 at 6:50 PM, Jason Rutherglen > > wrote: > > > I also think it's got a > > > lot of things now which m
Re: Realtime Search for Social Networks Collaboration
Interesting discussion. >>I think we should seriously look at joining efforts with open-source Database >>engine projects I posted some initial dabblings here with a couple of the databases on your list :http://markmail.org/message/3bu5klzzc5i6uhl7 but this is not really a scalable solution (which is what Jason and others need) >>for example joins are not possible using SOLR). It's largely *because* Lucene doesn't do joins that it can be made to scale out. I've replaced two large-scale database systems this year with distributed Lucene solutions because this scale-out architecture provided significantly better performance. These were "semi-structured" systems too. Lucene's comparitively simplistic data model/query model is both a weakness and a strength in this regard. Cheers, Mark.
[jira] Commented: (LUCENE-1131) Add numDeletedDocs to IndexReader
[ https://issues.apache.org/jira/browse/LUCENE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628953#action_12628953 ] Michael McCandless commented on LUCENE-1131: Hmm -- this breaks back compat (adds new abstract method to IndexReader). Why don't we fallback to default impl, in IndexReader, of maxDoc() - numDocs()? Patch is much less invasive, and, we don't break back compat? maxDoc() is indeed cheap. > Add numDeletedDocs to IndexReader > - > > Key: LUCENE-1131 > URL: https://issues.apache.org/jira/browse/LUCENE-1131 > Project: Lucene - Java > Issue Type: New Feature >Reporter: Shai Erera >Assignee: Otis Gospodnetic >Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-1131.patch > > > Add numDeletedDocs to IndexReader. Basically, the implementation is as simple > as doing: > public int numDeletedDocs() { > return deletedDocs == null ? 0 : deletedDocs.count(); > } > in SegmentReader. > Patch to follow to include in all IndexReader extensions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1354) Provide Programmatic Access to CheckIndex
[ https://issues.apache.org/jira/browse/LUCENE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1354: --- Fix Version/s: 2.4 > Provide Programmatic Access to CheckIndex > - > > Key: LUCENE-1354 > URL: https://issues.apache.org/jira/browse/LUCENE-1354 > Project: Lucene - Java > Issue Type: New Feature >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-1354.patch, LUCENE-1354.patch > > > Would be nice to have programmatic access to the CheckIndex tool, so that it > can be used in applications like Solr. > See SOLR-566 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Make auto fix delay configurable in CheckIndex.checkIndex?
OK -- I like that suggestion Andrew, so I incorporated it into new patch on LUCENE-1354. Now, it's CheckIndex's static main() that does that sleep, and then calls fix. This way you can call fix directly from your code. Mike Andrew Zhang wrote: On Sat, Sep 6, 2008 at 12:01 AM, Michael McCandless <[EMAIL PROTECTED] > wrote: This definitely makes sense -- there is an issue opened, with initial patch, to make programmatic access to CheckIndex possible, that may already cover this? Hi, Thanks for the information! It's https://issues.apache.org/jira/browse/LUCENE-1354 I took a look at the initial patch, but it still sleeps 5 seconds before doing auto fix. We may make it configurable, or provide a method fix() for end user? i.e. IndexChecker checker = new IndexChecker(); boolean ok = checker.check(); if(!ok) { checker.fix(); // or do some other thing? } Mike Andrew Zhang wrote: Hi, Currently, CheckIndex.checkIndex sleeps 5 seconds before fixing corrupted index. Does it make sense to make it configurable? Some applications just want to fix it asap. -- Best regards, Andrew Zhang db4o - database for Android: www.db4o.com http://zhanghuangzhu.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Best regards, Andrew Zhang db4o - database for Android: www.db4o.com http://zhanghuangzhu.blogspot.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1354) Provide Programmatic Access to CheckIndex
[ https://issues.apache.org/jira/browse/LUCENE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1354: --- Attachment: LUCENE-1354.patch Hi Grant, the patch looks good! I tweaked it a bit, to pass all tests, and also pulled out a separate fix() method as suggested here: http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200809.mbox/%3C4d0b24970809061944n5c617b36xc2951d74d989dc42%40mail.gmail.com%3E If this looks good can you commit for 2.4? > Provide Programmatic Access to CheckIndex > - > > Key: LUCENE-1354 > URL: https://issues.apache.org/jira/browse/LUCENE-1354 > Project: Lucene - Java > Issue Type: New Feature >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-1354.patch, LUCENE-1354.patch > > > Would be nice to have programmatic access to the CheckIndex tool, so that it > can be used in applications like Solr. > See SOLR-566 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1344) Make the Lucene jar an OSGi bundle
[ https://issues.apache.org/jira/browse/LUCENE-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628965#action_12628965 ] Michael McCandless commented on LUCENE-1344: Thanks Nicolas. I understand a bit more now :) One problem: even though I was able to successfully run the above command, the resulting MANIFEST.MF in the Lucene core JAR (dist/maven/org/apache/lucene/lucene-core/2.3.0/lucene-core-2.3.0.jar) does not have any of your added lines (eg Export-Package) -- do you see this too? {quote} About the different version schemes, yep, this is yet another one to maintain. The version number taken into account in a OSGI environment is "Bundle-Version", I don't know what the header "Specification-Version" is used for. I tried to refactor a little bit in the build system to generate the version numbers, but I failed, a more bigger patch would be needed (I am willing to do some if needed). {quote} I think it's OK for now if we have to update the versions in META-INF/MANIFEST.MF manually as part of the release process? (It sounds hard to get the build to autogen the versions). > Make the Lucene jar an OSGi bundle > -- > > Key: LUCENE-1344 > URL: https://issues.apache.org/jira/browse/LUCENE-1344 > Project: Lucene - Java > Issue Type: Improvement > Components: Build >Reporter: Nicolas Lalevée > Fix For: 2.4 > > Attachments: LUCENE-1344-r679133.patch, LUCENE-1344-r690675.patch, > LUCENE-1344-r690691.patch, MANIFEST.MF.diff > > > In order to use Lucene in an OSGi environment, some additional headers are > needed in the manifest of the jar. As Lucene has no dependency, it is pretty > straight forward and it ill be easy to maintain I think. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1131) Add numDeletedDocs to IndexReader
[ https://issues.apache.org/jira/browse/LUCENE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628967#action_12628967 ] Shai Erera commented on LUCENE-1131: What if we implement numDeletedDocs() in IndexReader, instead of defining it abstract? Those that extend IndexReader (outside the scope of the attached patch) can then choose to override the implementation or not. The purpose of the patch is to add an explicit method which developers can use, rather than understand the logic on maxDoc() - numDocs(). Not all extended classes implement it this way BTW. SegmentReader just calls deletedDocs.count(), rather then calling the two separate methods. > Add numDeletedDocs to IndexReader > - > > Key: LUCENE-1131 > URL: https://issues.apache.org/jira/browse/LUCENE-1131 > Project: Lucene - Java > Issue Type: New Feature >Reporter: Shai Erera >Assignee: Otis Gospodnetic >Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-1131.patch > > > Add numDeletedDocs to IndexReader. Basically, the implementation is as simple > as doing: > public int numDeletedDocs() { > return deletedDocs == null ? 0 : deletedDocs.count(); > } > in SegmentReader. > Patch to follow to include in all IndexReader extensions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1354) Provide Programmatic Access to CheckIndex
[ https://issues.apache.org/jira/browse/LUCENE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628966#action_12628966 ] Grant Ingersoll commented on LUCENE-1354: - will do. -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ > Provide Programmatic Access to CheckIndex > - > > Key: LUCENE-1354 > URL: https://issues.apache.org/jira/browse/LUCENE-1354 > Project: Lucene - Java > Issue Type: New Feature >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-1354.patch, LUCENE-1354.patch > > > Would be nice to have programmatic access to the CheckIndex tool, so that it > can be used in applications like Solr. > See SOLR-566 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Food for Thought: Why Search Engines Choke
http://arnoldit.com/wordpress/2008/09/06/text-processing-why-servers-choke/ Some interesting ideas here on speeding up Lucene. (Thanks to Erik for passing me the link) Note, the paper is comparing against 2.2. It would be good to put up numbers for 2.3, and it might be interesting to look into the ideas presented to see if we can learn anything from it - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Realtime Search for Social Networks Collaboration
On Sun, Sep 7, 2008 at 2:41 AM, mark harwood <[EMAIL PROTECTED]>wrote: >>for example joins are not possible using SOLR). > > It's largely *because* Lucene doesn't do joins that it can be made to scale > out. I've replaced two large-scale database systems this year with > distributed Lucene solutions because this scale-out architecture provided > significantly better performance. These were "semi-structured" systems too. > Lucene's comparitively simplistic data model/query model is both a weakness > and a strength in this regard. > Hey, maybe the right way to go for a truly scalable and high performance semi-structured database is to marry HBase (Big-table like data storage) with SOLR/Lucene.I concur with you in the sense that simplistic data models coupled with high performance are the killer. Let me quote this from the original Bigtable paper from Google: " Bigtable does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format, and allows clients to reason about the locality properties of the data represented in the underlying storage. Data is indexed using row and column names that can be arbitrary strings. Bigtable also treats data as uninterpreted strings, although clients often serialize various forms of structured and semi-structured data into these strings. Clients can control the locality of their data through careful choices in their schemas. Finally, Bigtable schema parameters let clients dynamically control whether to serve data out of memory or from disk."
Re: Realtime Search for Social Networks Collaboration
BTW, quoting Marcelo Ochoa (the developer behind the Oracle/Lucene implementation) the three minimal features a transactional DB should support for Lucene integration are: 1) The ability to define new functions (e.g. lcontains() lscore) which would allow to bind queries to lucene and obtain document/row scores 2) An API that would allow DML intercepts, like Oracle's ODCI. 3) The ability to extend and/or implement new types of "domain" indexes that the engine's query evaluation and execution/optimization planner can use efficiently. Thanks Marcelo. -- Joaquin On Sun, Sep 7, 2008 at 8:16 AM, J. Delgado <[EMAIL PROTECTED]>wrote: > On Sun, Sep 7, 2008 at 2:41 AM, mark harwood <[EMAIL PROTECTED]>wrote: > > >>for example joins are not possible using SOLR). >> >> It's largely *because* Lucene doesn't do joins that it can be made to >> scale out. I've replaced two large-scale database systems this year with >> distributed Lucene solutions because this scale-out architecture provided >> significantly better performance. These were "semi-structured" systems too. >> Lucene's comparitively simplistic data model/query model is both a weakness >> and a strength in this regard. >> > > Hey, maybe the right way to go for a truly scalable and high performance > semi-structured database is to marry HBase (Big-table like data storage) > with SOLR/Lucene.I concur with you in the sense that simplistic data models > coupled with high performance are the killer. > > Let me quote this from the original Bigtable paper from Google: > > " Bigtable does not support a full relational data model; instead, it > provides clients with a simple data model that supports dynamic control over > data layout and format, and allows clients to reason about the locality > properties of the data represented in the underlying storage. Data is > indexed using row and column names that can be arbitrary strings. Bigtable > also treats data as uninterpreted strings, although clients often serialize > various forms of structured and semi-structured data into these strings. > Clients can control the locality of their data through careful choices in > their schemas. Finally, Bigtable schema parameters let clients dynamically > control whether to serve data out of memory or from disk." > >
[jira] Updated: (LUCENE-1366) Rename Field.Index.UN_TOKENIZED/TOKENIZED/NO_NORMS
[ https://issues.apache.org/jira/browse/LUCENE-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1366: --- Attachment: LUCENE-1366.patch OK, this patch switches over all uses of the old names to the new ones. I plan to commit in a day or two. > Rename Field.Index.UN_TOKENIZED/TOKENIZED/NO_NORMS > -- > > Key: LUCENE-1366 > URL: https://issues.apache.org/jira/browse/LUCENE-1366 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 1.9, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-1366.patch, LUCENE-1366.patch > > > There is confusion about these current Field options and I think we > should rename them, deprecating the old names in 2.4/2.9 and removing > them in 3.0. How about this: > {code} > TOKENIZED --> ANALYZED > UN_TOKENIZED --> NOT_ANALYZED > NO_NORMS --> NOT_ANALYZED_NO_NORMS > {code} > Should we also add ANALYZED_NO_NORMS? > Spinoff from here: > > http://mail-archives.apache.org/mod_mbox/lucene-java-user/200808.mbox/%3C48a3076a.2679420a.1c53.a5c4%40mx.google.com%3E > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1131) Add numDeletedDocs to IndexReader
[ https://issues.apache.org/jira/browse/LUCENE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628991#action_12628991 ] Michael McCandless commented on LUCENE-1131: bq. What if we implement numDeletedDocs() in IndexReader, instead of defining it abstract? Right, that's exactly what I'm thinking, with this body: {code} public int numDeletedDocs() { return maxDoc() - numDocs(); } {code} Then I think no classes need to override it (perf cost of calling 2 methods is tiny)? > Add numDeletedDocs to IndexReader > - > > Key: LUCENE-1131 > URL: https://issues.apache.org/jira/browse/LUCENE-1131 > Project: Lucene - Java > Issue Type: New Feature >Reporter: Shai Erera >Assignee: Otis Gospodnetic >Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-1131.patch > > > Add numDeletedDocs to IndexReader. Basically, the implementation is as simple > as doing: > public int numDeletedDocs() { > return deletedDocs == null ? 0 : deletedDocs.count(); > } > in SegmentReader. > Patch to follow to include in all IndexReader extensions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-1369) Eliminate unnecessary uses of Hashtable and Vector
[ https://issues.apache.org/jira/browse/LUCENE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1369. Resolution: Fixed Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) Thanks DM! > Eliminate unnecessary uses of Hashtable and Vector > -- > > Key: LUCENE-1369 > URL: https://issues.apache.org/jira/browse/LUCENE-1369 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.3.2 >Reporter: DM Smith >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-1369.patch > > > Lucene uses Vector, Hashtable and Enumeration when it doesn't need to. > Changing to ArrayList and HashMap may provide better performance. > There are a few places Vector shows up in the API. IMHO, List should have > been used for parameters and return values. > There are a few distinct usages of these classes: > # internal but with ArrayList or HashMap would do as well. These can simply > be replaced. > # internal and synchronization is required. Either leave as is or use a > collections synchronization wrapper. > # As a parameter to a method where List or Map would do as well. For contrib, > just replace. For core, deprecate current and add new method signature. > # Generated by JavaCC. (All *.jj files.) Nothing to be done here. > # As a base class. Not sure what to do here. (Only applies to SegmentInfos > extends Vector, but it is not used in a safe manner in all places. Perhaps, > implements List would be better.) > # As a return value from a package protected method, but synchronization is > not used. Change return type. > # As a return value to a final method. Change to List or Map. > In using a Vector the following iteration pattern is frequently used. > for (int i = 0; i < v.size(); i++) { > Object o = v.elementAt(i); > } > This is an indication that synchronization is unimportant. The list could > change during iteration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Make auto fix delay configurable in CheckIndex.checkIndex?
On Sun, Sep 7, 2008 at 6:54 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: > > OK -- I like that suggestion Andrew, so I incorporated it into new patch on > LUCENE-1354. Now, it's CheckIndex's static main() that does that sleep, and > then calls fix. This way you can call fix directly from your code. Great! I see the fix in the patch. Thanks a lot, Mike! > > > Mike > > > Andrew Zhang wrote: > > >> >> On Sat, Sep 6, 2008 at 12:01 AM, Michael McCandless < >> [EMAIL PROTECTED]> wrote: >> >> This definitely makes sense -- there is an issue opened, with initial >> patch, to make programmatic access to CheckIndex possible, that may already >> cover this? >> >> Hi, >> >> Thanks for the information! It's >> https://issues.apache.org/jira/browse/LUCENE-1354 >> >> I took a look at the initial patch, but it still sleeps 5 seconds before >> doing auto fix. >> >> We may make it configurable, or provide a method fix() for end user? i.e. >> >> IndexChecker checker = new IndexChecker(); >> boolean ok = checker.check(); >> if(!ok) { >> checker.fix(); // or do some other thing? >> } >> >> >> Mike >> >> >> Andrew Zhang wrote: >> >> Hi, >> >> Currently, CheckIndex.checkIndex sleeps 5 seconds before fixing corrupted >> index. Does it make sense to make it configurable? Some applications just >> want to fix it asap. >> >> -- >> Best regards, >> Andrew Zhang >> >> db4o - database for Android: www.db4o.com >> http://zhanghuangzhu.blogspot.com/ >> >> >> - >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> >> >> -- >> Best regards, >> Andrew Zhang >> >> db4o - database for Android: www.db4o.com >> http://zhanghuangzhu.blogspot.com/ >> > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Best regards, Andrew Zhang db4o - database for Android: www.db4o.com http://zhanghuangzhu.blogspot.com/
Re: Realtime Search for Social Networks Collaboration
Hi, - Original Message From: J. Delgado <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Sunday, September 7, 2008 4:04:58 AM Subject: Re: Realtime Search for Social Networks Collaboration On Sat, Sep 6, 2008 at 1:36 AM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: Regarding real-time search and Solr, my feeling is the focus should be on first adding real-time search to Lucene, and then we'll figure out how to incorporate that into Solr later. Otis, what do you mean exactly by "adding real-time search to Lucene"? Note that Lucene, being a indexing/search library (and not a full blown search engine), is by definition "real-time": once you add/write a document to the index it becomes immediately searchable and if a document is logically deleted and no longer returned in a search, though physical deletion happens during an index optimization. OG: When I think about real-time search I see it as: "Make the newly added document show up in search results without closing and reopening the whole index with IndexWriter. In other words, minimize re-reading of the old/unchanged data just to be able to see the newly added data." I believe this is similar to what IndexReader.reopen does and Jason does make use of it. Otis Now, the problem of adding/deletingdocuments in bulk, as part of a transaction and making these documents available for search immediately after the transaction is commited sounds more like a search engine problem (i.e. SOLR, Nutch, Ocean), specially if these transactions are known to be I/O expensive and thus are usually implemented bached proceeses with some kind of sync mechanism, which makes them non real-time. For example, in my previous life, I designed and help implement a quasi-realtime enterprise search engine using Lucene, having a set of multi-threaded indexers hitting a set of multiple indexes alocatted accross different search services which powered a broker based distributed search interface. The most recent documents provided to the indexers were always added to the smaller in-memory (RAM) indexes which usually could absorbe the load of a bulk "add" transaction and later would be merged into larger disk based indexes and then flushed to make them ready to absorbe new fresh docs. We even had further partitioning of the indexes that reflected time periods with caps on size for them to be merged into older more archive based indexes which were used less (yes the search engine default search was on data no more than 1 month old, though user could open the time window by including archives). As for SOLR and OCEAN, I would argue that these semi-structured search engines are becomming more and more like relational databases with full-text search capablities (without the benefit of full reletional algebra -- for example joins are not possible using SOLR). Notice that "real-time" CRUD operations and transactionality are core DB concepts adn have been studied and developed by database communities for aquite long time. There has been recent efforts on how to effeciently integrate Lucene into releational databases (see Lucene JVM ORACLE integration, see http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html) I think we should seriously look at joining efforts with open-source Database engine projects, written in Java (see http://java-source.net/open-source/database-engines) in order to blend IR and ORM for once and for all. -- Joaquin I've read Jason's Wiki as well. Actually, I had to read it a number of times to understand bits and pieces of it. I have to admit there is still some fuzziness about the whole things in my head - is "Ocean" something that already works, a separate project on googlecode.com? I think so. If so, and if you are working on getting it integrated into Lucene, would it make it less confusing to just refer to it as "real-time search", so there is no confusion? If this is to be initially integrated into Lucene, why are things like replication, crowding/field collapsing, locallucene, name service, tag index, etc. all mentioned there on the Wiki and bundled with description of how real-time search works and is to be implemented? I suppose mentioning replication kind-of makes sense because the replication approach is closely tied to real-time search - all query nodes need to see index changes fast. But Lucene itself offers no replication mechanism, so maybe the replication is something to figure out separately, say on the Solr level, later on "once we get there". I think even just the essential real-time search requires substantial changes to Lucene (I remember seeing large patches in JIRA), which makes it hard to digest, understand, comment on, and ultimately commit (hence the luke warm response, I think). Bringing other non-essential elements into discussion at the same time makes it more difficult t o process all this new stuff, at least for
[jira] Created: (LUCENE-1378) Remove remaining @author references
Remove remaining @author references --- Key: LUCENE-1378 URL: https://issues.apache.org/jira/browse/LUCENE-1378 Project: Lucene - Java Issue Type: Task Reporter: Otis Gospodnetic Priority: Trivial Fix For: 2.4 Attachments: LUCENE-1378.patch $ find . -name \*.java | xargs grep '@author' | cut -d':' -f1 | xargs perl -pi -e 's/ [EMAIL PROTECTED]//' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1378) Remove remaining @author references
[ https://issues.apache.org/jira/browse/LUCENE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated LUCENE-1378: - Attachment: LUCENE-1378.patch > Remove remaining @author references > --- > > Key: LUCENE-1378 > URL: https://issues.apache.org/jira/browse/LUCENE-1378 > Project: Lucene - Java > Issue Type: Task >Reporter: Otis Gospodnetic >Priority: Trivial > Fix For: 2.4 > > Attachments: LUCENE-1378.patch > > > $ find . -name \*.java | xargs grep '@author' | cut -d':' -f1 | xargs perl > -pi -e 's/ [EMAIL PROTECTED]//' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1378) Remove remaining @author references
[ https://issues.apache.org/jira/browse/LUCENE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629088#action_12629088 ] Paul Elschot commented on LUCENE-1378: -- The patch of 20080907 has some commented code added in SweetSpotSimilarityTest, probably unwanted. Also, author lines are replaced by emty comment lines, perhaps it's better remove these lines completely. I didn't see any place where that could go wrong by changing the perl substitute command to do so, and the compiler would find such possible comment errors anyway. > Remove remaining @author references > --- > > Key: LUCENE-1378 > URL: https://issues.apache.org/jira/browse/LUCENE-1378 > Project: Lucene - Java > Issue Type: Task >Reporter: Otis Gospodnetic >Priority: Trivial > Fix For: 2.4 > > Attachments: LUCENE-1378.patch > > > $ find . -name \*.java | xargs grep '@author' | cut -d':' -f1 | xargs perl > -pi -e 's/ [EMAIL PROTECTED]//' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
jessica simpson upskirt
hots jessica simpson upskirt http://jessica-simpson-pic.blogspot.com/ --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Mumbai Masti" group. To post to this group, send email to Mumbai-Masti@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.co.in/group/Mumbai-Masti?hl=en -~--~~~~--~~--~--~---