Re: Upgrade to 3.6 OR wait for 4.0
I have to use stable versions too, and that's why I delayed upgrading my code until 4.0-ALPHA was out. Since I don't have any problems with API breaks, i.e. I'm only concerned with index format back-compat, 4.0-ALPHA to me was stable. If you require both index format + stable API, then wait for 4.0-BETA. 4.0 will probably include more hardening to the code after 4.0-BETA, which means likely bug fixes and such. If that is your definition of 'stable' then wait for it. As for timelines, I have no idea :). It took nearly a year to stabilize the code enough (and index format) for 4.0-ALPHA to be released. I hope that 4.0-BETA and 4.0.0 won't be long from now :) Shai On Tue, Jul 10, 2012 at 9:21 AM, Ganesh wrote: > Thanks for the reply. Any idea how much time it would take to go for 4.0 > stable release? I want to go for v4.0 but i have to use only the stable > version. > > Regards > Ganesh > > > - Original Message - > From: "Shai Erera" > To: > Sent: Tuesday, July 10, 2012 10:50 AM > Subject: Re: Upgrade to 3.6 OR wait for 4.0 > > > > Hi Ganesh > > > > I recently upgraded my code to 3.6, and yesterday finished part of my > > upgrades to 4.0-ALPHA. > > > > Upgrading from 3.0.3 to 3.6 is relatively easy as all API should be > > backwards compatible. But I think there were some API breaks, and > > back-compat issues. Therefore, if I were you, I'd first upgrade from > 3.0.3 > > to 3.6, resolving all 'deprecated' API warnings and making sure the > > back-compat issues do not affect me (or resolve them too !). > > > > Then, I'd upgrade to 4.0-ALPHA. A lot of API has been changed, and so > most > > likely you'll need to touch large parts of your code again. > > > > Going this route, you gain all the new features and enhancements of 3.6, > > while knowing that you run on a 'stable' Lucene version. Upgrading to > > 4.0-ALPHA comes with even more gains, but this release will probably go > > under some API changes (API is expected to freeze in BETA), though the > > index format is not going to change in incompatible ways (unless there's > a > > bug ... you can read the release notes), so depending on how much you > want > > to risk doing the upgrade for a still 'work in progress' code. > > > > Hope this helps. > > > > Shai > > > > On Tue, Jul 10, 2012 at 7:28 AM, Ganesh wrote: > > > >> Hello all, > >> > >> I am currently using v3.0.3 and planning to upgrade to v3.6. Shall i go > >> ahead with the upgrade OR wait for 4.0? > >> > >> Regards > >> Ganesh > >> > >> - > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: about some seacher(I'm new hand, thank you for help)
but,how can i used it in lucene File logFile= new File("D:\\logFile"); BufferedReader reader=null; String str = null; reader = new BufferedReader(new FileReader(logFile)); while ((str=reader.readLine())!=null) { String timestamp = str.substring(1, 13); String content = str.substring(14).trim(); } in this way ,we can get the data,but document.add(new Field("content", content,Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS); it's must be wrong. -- View this message in context: http://lucene.472066.n3.nabble.com/about-some-seacher-I-m-new-hand-thank-you-for-help-tp3993397p3994093.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: about some seacher(I'm new hand, thank you for help)
hi sam you can add content and time stamp field like this. // add content doc.add(new Field("contents", content,Field.Store.NO,Field.Index.ANALYZED); // add timestamp NumericField timestampField = new NumericField("timestamp"); timestampField.setLongValue(DateField.stringToTime(timestamp)); doc.add(timestampField); To perform range querying or filtering against a NumericField, use NumericRangeQuery or NumericRangeFilter. you can see the http://wiki.apache.org/lucene-java/SearchNumericalFields to find any useful information. On Tue, Jul 10, 2012 at 3:38 PM, sam wrote: > timestamp -- Don't Grow Old, Grow Up... :-)
Re: about some seacher(I'm new hand, thank you for help)
thank you very much. it's good for me. --- 12年7月10日,周二, feng lu [via Lucene] 写道: 发件人: feng lu [via Lucene] 主题: Re: about some seacher(I'm new hand, thank you for help) 收件人: "sam" 日期: 2012年7月10日,周二,下午4:25 hi sam you can add content and time stamp field like this. // add content doc.add(new Field("contents", content,Field.Store.NO,Field.Index.ANALYZED); // add timestamp NumericField timestampField = new NumericField("timestamp"); timestampField.setLongValue(DateField.stringToTime(timestamp)); doc.add(timestampField); To perform range querying or filtering against a NumericField, use NumericRangeQuery or NumericRangeFilter. you can see the http://wiki.apache.org/lucene-java/SearchNumericalFields to find any useful information. On Tue, Jul 10, 2012 at 3:38 PM, sam <[hidden email]> wrote: > timestamp -- Don't Grow Old, Grow Up... :-) If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/about-some-seacher-I-m-new-hand-thank-you-for-help-tp3993397p3994094.html To unsubscribe from about some seacher(I'm new hand, thank you for help), click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/about-some-seacher-I-m-new-hand-thank-you-for-help-tp3993397p3994102.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Re: index.merge.scheduler exception - java.io.IOException: Input/output error
If you can live with the loss of 385395 documents, running with -fix is an option. I'd create a new index. I'd also worry about why the existing index got messed up in the first place. I've no idea about running fsck on ec2 file systems. General file system commands hanging for 10 secs doesn't sound good - I'd worry about that first of all. -- Ian. On Mon, Jul 9, 2012 at 11:17 PM, T Vinod Gupta wrote: > thanks this was really helpful to understand whats going on.. > i got these for 2 of my indexes - > > WARNING: 29 broken segments (containing 385395 documents) detected > WARNING: would write new segments file, and 385395 documents would be lost, > if -fix were specified > > WARNING: 29 broken segments (containing 385395 documents) detected > WARNING: would write new segments file, and 385395 documents would be lost, > if -fix were specified > > so my only option is to run with -fix and live with the data loss right? no > other option right? > will running fsck help? sometimes my ls or less commands also hang for a > good 10 sec.. this somehow indicates that there is some corruption. > > thanks > > On Mon, Jul 9, 2012 at 6:27 AM, Erick Erickson wrote: > >> no, you can't delete those files, and you can't regenerate just those >> files, >> all the various segment files are necessary and intertwined... >> >> Consider using the CheckIndex facility, see: >> http://solr.pl/en/2011/01/17/checkindex-for-the-rescue/ >> >> note, the CheckIndex class is contained in the lucene core jar >> >> You can run it with the -fix option to repair (at, perhaps, the expense >> of loss of some documents) if you choose, but running it without >> that option first is probably a good idea.. >> >> Best >> Erick >> >> On Mon, Jul 9, 2012 at 7:43 AM, T Vinod Gupta >> wrote: >> > this is on local file system on amazon ec2 host. the file system was fine >> > until a week ago when the outage happened and there were probably some >> > system glitches. i have seen this issue since then.. sometimes regular >> > commands like less or ls hang for many seconds even though there is no >> > cpu/memory pressure on the machine. >> > >> > in my case, there are only 2 unique entries for which i see this error. >> one >> > for a .fdt file and one for a .tis file. is it possible to regenerate >> those >> > files somehow? if i delete those 2 files, will the entire index get >> > corrupted? im ok to live with some data loss if it makes it more stable >> and >> > performant. >> > >> > thanks >> > >> > On Mon, Jul 9, 2012 at 2:28 AM, Ian Lea wrote: >> > >> >> Is this on a local or remote file system? Is the file system itself >> >> OK? Is something else messing with your lucene index at the same >> >> time? >> >> >> >> >> >> -- >> >> Ian. >> >> >> >> >> >> On Sun, Jul 8, 2012 at 8:58 PM, T Vinod Gupta >> >> wrote: >> >> > Hi, >> >> > My log files are showing the below exceptions almost at twice a minute >> >> > frequency. what is causing it and how can i fix it? I am not using >> lucene >> >> > directly but instead using elasticsearch (0.18.7 version). but since >> the >> >> > stack trace is all lucene, i am sending it to this mailing list. >> >> > >> >> > also, my queries are taking a long time to execute (sometimes take a >> >> > minute). could this be contributing to it somehow? >> >> > >> >> > [2012-07-08 19:44:19,887][WARN ][index.merge.scheduler] [> >> > name>] [twitter][4] failed to merge >> >> > java.io.IOException: Input/output error: >> >> > NIOFSIndexInput(path="/media/ephemeral0 >> >> > /ES_data/elasticsearch/nodes/0/indices/twitter/4/index/_2h29k.tis") >> >> > at >> >> > org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(N >> >> > IOFSDirectory.java:180) >> >> > at >> >> > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput. >> >> > java:229) >> >> > at >> >> > >> >> >> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39) >> >> > at >> org.apache.lucene.store.DataInput.readVInt(DataInput.java:105) >> >> > at >> >> > >> >> >> org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:197) >> >> > at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:86) >> >> > at >> >> > org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:133) >> >> > at >> >> > >> org.apache.lucene.index.SegmentMergeInfo.next(SegmentMergeInfo.java:72) >> >> > at >> >> > >> >> >> org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:546) >> >> > at >> >> > >> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:473) >> >> > at >> >> > org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:109) >> >> > at >> >> > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4295) >> >> > at >> >> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3940) >> >> > at >> >> > >> >> >> org.apache.lucene.inde
Re: Spatial Search
Amir, CachedDistanceValueSource is indeed poorly named; I need to get renaming it on the TODO list; I've identified this before. Calculating the distance is computationally cheap enough to calculate for the X number of results (top-20-ish) you are returning in your search results to not bother trying to cache it, although I don't rule out caching it at some point. On timing... know that the Lucene spatial module was committed in ~March, and there has been steady work lately on various components involved (Spatial4j, Lucene spatial module, Solr adapters). I *really* want to get this nailed down for Lucene/Solr 4. There is a big difference between simply having working code (that is only partially tested but seems to work), and addressing documentation, full testing, and consensus on the API between interested parties (e.g. Chris, Ryan, and me.). The last bit, consensus, is what bogs things down, in my experience. Specifically about the 1/distance thing... not sure when that'll happen, maybe in a couple weeks. Maybe. I created a JIRA issue so you can start watching it to be notified of progress: https://issues.apache.org/jira/browse/LUCENE-4208 ~ David - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Spatial-Search-tp3623494p3994211.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Index of Lucene
Much more clear explanation than the wiki! Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Index-of-Lucene-tp555857p3994239.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
storing pre-analyzed fields
I have a question about the API for storing and indexing lucene documents (in 3.x). If I want to index a document by providing a TokenStream, I can do that by calling document.add (field) where field is something I write deriving from AbstractField that returns the TokenStream for tokenStreamValue(), and nothing for stringValue() or readerValue(). Now if I also want to store a value for that field, do I just add a different field with different options (eg stored=true, and the field a normal Field)? Do these two things conflict in any way? Do I have to be careful about the order in which I do them? Or is it just a mildly weird API with no lurking ill effects? :) Also: I have been seeing various e-mails about changes to this API so I assume it's all different in 4.0; if you want to take this opportunity to explain that, please go ahead, but for now I am working with the 3.x API. Thanks -Mike Sokolov - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: storing pre-analyzed fields
Hi Mike, The order does not matter at all in all versions of Lucene. You also don't need to subclass AbstractField (but you can use e.g. NumericField as an example); it is enough to use new Field(name, TokenStream); if you also want to store this field, simply add a stored-only field with the *same* name (in addition to the TokenStream one). In Lucene 4.0 we are going the direction to split between the "Document" objects using for indexing from them returned by IndexReader/Searcher, because they are two different things and the latter only returning stored fields. But this does not affect anything here. In all Lucene versions, stored field values and indexed values are completely decoupled and do not relate to each other at all. Adding a Field in stored+indexed way is just for convenience, but you can also add it two times (one time as stored, one time as indexed - I prefer to always do this) in any order. The resulting index will be identical (don't compare files; there will be differences in headers!). There is one importance of order: Fields with the same name and same type rely on order, so two stored fields with same name are returned in same order by IndexReader/-Searcher, and 2 indexed fields with same name produce the same order for e.g. PhraseQuery or SpanQuery only, if the Field order is predefined. But you can interleave the Field instances for each type as you like. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Michael Sokolov [mailto:soko...@ifactory.com] > Sent: Wednesday, July 11, 2012 2:54 AM > To: java-user@lucene.apache.org > Subject: storing pre-analyzed fields > > I have a question about the API for storing and indexing lucene documents (in > 3.x). > > If I want to index a document by providing a TokenStream, I can do that by > calling document.add (field) where field is something I write deriving from > AbstractField that returns the TokenStream for tokenStreamValue(), and > nothing for stringValue() or readerValue(). > > Now if I also want to store a value for that field, do I just add a different field > with different options (eg stored=true, and the field a normal Field)? > > Do these two things conflict in any way? Do I have to be careful about the > order in which I do them? Or is it just a mildly weird API with no lurking ill > effects? :) > > Also: I have been seeing various e-mails about changes to this API so I assume > it's all different in 4.0; if you want to take this opportunity to explain that, > please go ahead, but for now I am working with the 3.x API. > > Thanks > > -Mike Sokolov > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org