Re: Upgrade to 3.6 OR wait for 4.0

2012-07-10 Thread Shai Erera
I have to use stable versions too, and that's why I delayed upgrading my
code until 4.0-ALPHA was out. Since I don't have any problems with API
breaks, i.e. I'm only concerned with index format back-compat, 4.0-ALPHA to
me was stable.

If you require both index format + stable API, then wait for 4.0-BETA.

4.0 will probably include more hardening to the code after 4.0-BETA, which
means likely bug fixes and such. If that is your definition of 'stable'
then wait for it.

As for timelines, I have no idea :). It took nearly a year to stabilize the
code enough (and index format) for 4.0-ALPHA to be released. I hope that
4.0-BETA and 4.0.0 won't be long from now :)

Shai

On Tue, Jul 10, 2012 at 9:21 AM, Ganesh  wrote:

> Thanks for the reply. Any idea how much time it would take to go for 4.0
> stable release? I want to go for v4.0 but i have to use only the stable
> version.
>
> Regards
> Ganesh
>
>
> - Original Message -
> From: "Shai Erera" 
> To: 
> Sent: Tuesday, July 10, 2012 10:50 AM
> Subject: Re: Upgrade to 3.6 OR wait for 4.0
>
>
> > Hi Ganesh
> >
> > I recently upgraded my code to 3.6, and yesterday finished part of my
> > upgrades to 4.0-ALPHA.
> >
> > Upgrading from 3.0.3 to 3.6 is relatively easy as all API should be
> > backwards compatible. But I think there were some API breaks, and
> > back-compat issues. Therefore, if I were you, I'd first upgrade from
> 3.0.3
> > to 3.6, resolving all 'deprecated' API warnings and making sure the
> > back-compat issues do not affect me (or resolve them too !).
> >
> > Then, I'd upgrade to 4.0-ALPHA. A lot of API has been changed, and so
> most
> > likely you'll need to touch large parts of your code again.
> >
> > Going this route, you gain all the new features and enhancements of 3.6,
> > while knowing that you run on a 'stable' Lucene version. Upgrading to
> > 4.0-ALPHA comes with even more gains, but this release will probably go
> > under some API changes (API is expected to freeze in BETA), though the
> > index format is not going to change in incompatible ways (unless there's
> a
> > bug ... you can read the release notes), so depending on how much you
> want
> > to risk doing the upgrade for a still 'work in progress' code.
> >
> > Hope this helps.
> >
> > Shai
> >
> > On Tue, Jul 10, 2012 at 7:28 AM, Ganesh  wrote:
> >
> >> Hello all,
> >>
> >> I am currently using v3.0.3 and planning to upgrade to v3.6. Shall i go
> >> ahead with the upgrade OR wait for 4.0?
> >>
> >> Regards
> >> Ganesh
> >>
> >> -
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >>
> >
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: about some seacher(I'm new hand, thank you for help)

2012-07-10 Thread sam
but,how can i used it in lucene
File logFile= new File("D:\\logFile"); 
BufferedReader reader=null;
String str = null;
reader = new BufferedReader(new FileReader(logFile));
while ((str=reader.readLine())!=null) {
String timestamp = str.substring(1, 13);
String content = str.substring(14).trim();
}
in  this way ,we can get the data,but
document.add(new Field("content",
content,Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS);
it's must be wrong.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/about-some-seacher-I-m-new-hand-thank-you-for-help-tp3993397p3994093.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: about some seacher(I'm new hand, thank you for help)

2012-07-10 Thread feng lu
hi sam

you can add content and time stamp field like this.

// add content
 doc.add(new Field("contents", content,Field.Store.NO,Field.Index.ANALYZED);
 // add timestamp
 NumericField timestampField = new NumericField("timestamp");
 timestampField.setLongValue(DateField.stringToTime(timestamp));
 doc.add(timestampField);

To perform range querying or filtering against a NumericField, use
NumericRangeQuery or NumericRangeFilter. you can see the
http://wiki.apache.org/lucene-java/SearchNumericalFields to find any useful
information.

On Tue, Jul 10, 2012 at 3:38 PM, sam  wrote:

> timestamp




-- 
Don't Grow Old, Grow Up... :-)


Re: about some seacher(I'm new hand, thank you for help)

2012-07-10 Thread sam
thank you very much. it's good for me.

--- 12年7月10日,周二, feng lu [via Lucene] 
 写道:

发件人: feng lu [via Lucene] 
主题: Re: about some seacher(I'm new hand, thank you for help)
收件人: "sam" 
日期: 2012年7月10日,周二,下午4:25



hi sam


you can add content and time stamp field like this.


// add content

 doc.add(new Field("contents", content,Field.Store.NO,Field.Index.ANALYZED);

 // add timestamp

 NumericField timestampField = new NumericField("timestamp");

 timestampField.setLongValue(DateField.stringToTime(timestamp));

 doc.add(timestampField);


To perform range querying or filtering against a NumericField, use

NumericRangeQuery or NumericRangeFilter. you can see the

http://wiki.apache.org/lucene-java/SearchNumericalFields to find any useful

information.


On Tue, Jul 10, 2012 at 3:38 PM, sam <[hidden email]> wrote:


> timestamp





-- 

Don't Grow Old, Grow Up... :-)









If you reply to this email, your message will be added to the 
discussion below:

http://lucene.472066.n3.nabble.com/about-some-seacher-I-m-new-hand-thank-you-for-help-tp3993397p3994094.html



To unsubscribe from about some seacher(I'm new hand, thank you 
for help), click here.

NAML


--
View this message in context: 
http://lucene.472066.n3.nabble.com/about-some-seacher-I-m-new-hand-thank-you-for-help-tp3993397p3994102.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

Re: index.merge.scheduler exception - java.io.IOException: Input/output error

2012-07-10 Thread Ian Lea
If you can live with the loss of 385395 documents, running with -fix
is an option.  I'd create a new index.  I'd also worry about why the
existing index got messed up in the first place.

I've no idea about running fsck on ec2 file systems. General file
system commands hanging for 10 secs doesn't sound good - I'd worry
about that first of all.


--
Ian.


On Mon, Jul 9, 2012 at 11:17 PM, T Vinod Gupta  wrote:
> thanks this was really helpful to understand whats going on..
> i got these for 2 of my indexes -
>
> WARNING: 29 broken segments (containing 385395 documents) detected
> WARNING: would write new segments file, and 385395 documents would be lost,
> if -fix were specified
>
> WARNING: 29 broken segments (containing 385395 documents) detected
> WARNING: would write new segments file, and 385395 documents would be lost,
> if -fix were specified
>
> so my only option is to run with -fix and live with the data loss right? no
> other option right?
> will running fsck help? sometimes my ls or less commands also hang for a
> good 10 sec.. this somehow indicates that there is some corruption.
>
> thanks
>
> On Mon, Jul 9, 2012 at 6:27 AM, Erick Erickson wrote:
>
>> no, you can't delete those files, and you can't regenerate just those
>> files,
>> all the various segment files are necessary and intertwined...
>>
>> Consider using the CheckIndex facility, see:
>> http://solr.pl/en/2011/01/17/checkindex-for-the-rescue/
>>
>> note, the CheckIndex class is contained in the lucene core jar
>>
>> You can run it with the -fix option to repair (at, perhaps, the expense
>> of loss of some documents) if you choose, but running it without
>> that option first is probably a good idea..
>>
>> Best
>> Erick
>>
>> On Mon, Jul 9, 2012 at 7:43 AM, T Vinod Gupta 
>> wrote:
>> > this is on local file system on amazon ec2 host. the file system was fine
>> > until a week ago when the outage happened and there were probably some
>> > system glitches. i have seen this issue since then.. sometimes regular
>> > commands like less or ls hang for many seconds even though there is no
>> > cpu/memory pressure on the machine.
>> >
>> > in my case, there are only 2 unique entries for which i see this error.
>> one
>> > for a .fdt file and one for a .tis file. is it possible to regenerate
>> those
>> > files somehow? if i delete those 2 files, will the entire index get
>> > corrupted? im ok to live with some data loss if it makes it more stable
>> and
>> > performant.
>> >
>> > thanks
>> >
>> > On Mon, Jul 9, 2012 at 2:28 AM, Ian Lea  wrote:
>> >
>> >> Is this on a local or remote file system?  Is the file system itself
>> >> OK?  Is something else messing with your lucene index at the same
>> >> time?
>> >>
>> >>
>> >> --
>> >> Ian.
>> >>
>> >>
>> >> On Sun, Jul 8, 2012 at 8:58 PM, T Vinod Gupta 
>> >> wrote:
>> >> > Hi,
>> >> > My log files are showing the below exceptions almost at twice a minute
>> >> > frequency. what is causing it and how can i fix it? I am not using
>> lucene
>> >> > directly but instead using elasticsearch (0.18.7 version). but since
>> the
>> >> > stack trace is all lucene, i am sending it to this mailing list.
>> >> >
>> >> > also, my queries are taking a long time to execute (sometimes take a
>> >> > minute). could this be contributing to it somehow?
>> >> >
>> >> > [2012-07-08 19:44:19,887][WARN ][index.merge.scheduler] [> >> > name>] [twitter][4] failed to merge
>> >> > java.io.IOException: Input/output error:
>> >> > NIOFSIndexInput(path="/media/ephemeral0
>> >> > /ES_data/elasticsearch/nodes/0/indices/twitter/4/index/_2h29k.tis")
>> >> > at
>> >> > org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(N
>> >> > IOFSDirectory.java:180)
>> >> > at
>> >> > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.
>> >> > java:229)
>> >> > at
>> >> >
>> >>
>> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>> >> > at
>> org.apache.lucene.store.DataInput.readVInt(DataInput.java:105)
>> >> > at
>> >> >
>> >>
>> org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:197)
>> >> > at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:86)
>> >> > at
>> >> > org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:133)
>> >> > at
>> >> >
>> org.apache.lucene.index.SegmentMergeInfo.next(SegmentMergeInfo.java:72)
>> >> > at
>> >> >
>> >>
>> org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:546)
>> >> > at
>> >> >
>> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:473)
>> >> > at
>> >> > org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:109)
>> >> > at
>> >> > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4295)
>> >> > at
>> >> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3940)
>> >> > at
>> >> >
>> >>
>> org.apache.lucene.inde

Re: Spatial Search

2012-07-10 Thread David Smiley (@MITRE.org)
Amir,

CachedDistanceValueSource is indeed poorly named; I need to get renaming it
on the TODO list; I've identified this before.  Calculating the distance is
computationally cheap enough to calculate for the X number of results
(top-20-ish) you are returning in your search results to not bother trying
to cache it, although I don't rule out caching it at some point.

On timing... know that the Lucene spatial module was committed in ~March,
and there has been steady work lately on various components involved
(Spatial4j, Lucene spatial module, Solr adapters).  I *really* want to get
this nailed down for Lucene/Solr 4.  There is a big difference between
simply having working code (that is only partially tested but seems to
work), and addressing documentation, full testing, and consensus on the API
between interested parties (e.g. Chris, Ryan, and me.).  The last bit,
consensus, is what bogs things down, in my experience.

Specifically about the 1/distance thing... not sure when that'll happen,
maybe in a couple weeks.  Maybe.  I created a JIRA issue so you can start
watching it to be notified of progress:
https://issues.apache.org/jira/browse/LUCENE-4208

~ David

-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spatial-Search-tp3623494p3994211.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Index of Lucene

2012-07-10 Thread nanshi
Much more clear explanation than the wiki! Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-of-Lucene-tp555857p3994239.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



storing pre-analyzed fields

2012-07-10 Thread Michael Sokolov
I have a question about the API for storing and indexing lucene 
documents (in 3.x).


If I want to index a document by providing a TokenStream, I can do that 
by calling document.add (field) where field is something I write 
deriving from AbstractField that returns the TokenStream for 
tokenStreamValue(), and nothing for stringValue() or readerValue().


Now if I also want to store a value for that field, do I just add a 
different field with different options (eg stored=true, and the field a 
normal Field)?


Do these two things conflict in any way?  Do I have to be careful about 
the order in which I do them?  Or is it just a mildly weird API with no 
lurking ill effects? :)


Also: I have been seeing various e-mails about changes to this API so I 
assume it's all different in 4.0; if you want to take this opportunity 
to explain that, please go ahead, but for now I am working with the 3.x API.


Thanks

-Mike Sokolov

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: storing pre-analyzed fields

2012-07-10 Thread Uwe Schindler
Hi Mike,

The order does not matter at all in all versions of Lucene. You also don't
need to subclass AbstractField (but you can use e.g. NumericField as an
example); it is enough to use new Field(name, TokenStream); if you also want
to store this field, simply add a stored-only field with the *same* name (in
addition to the TokenStream one).

In Lucene 4.0 we are going the direction to split between the "Document"
objects using for indexing from them returned by IndexReader/Searcher,
because they are two different things and the latter only returning stored
fields. But this does not affect anything here.

In all Lucene versions, stored field values and indexed values are
completely decoupled and do not relate to each other at all. Adding a Field
in stored+indexed way is just for convenience, but you can also add it two
times (one time as stored, one time as indexed - I prefer to always do this)
in any order. The resulting index will be identical (don't compare files;
there will be differences in headers!).

There is one importance of order: Fields with the same name and same type
rely on order, so two stored fields with same name are returned in same
order by IndexReader/-Searcher, and 2 indexed fields with same name produce
the same order for e.g. PhraseQuery or SpanQuery only, if the Field order is
predefined. But you can interleave the Field instances for each type as you
like.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Michael Sokolov [mailto:soko...@ifactory.com]
> Sent: Wednesday, July 11, 2012 2:54 AM
> To: java-user@lucene.apache.org
> Subject: storing pre-analyzed fields
> 
> I have a question about the API for storing and indexing lucene documents
(in
> 3.x).
> 
> If I want to index a document by providing a TokenStream, I can do that by
> calling document.add (field) where field is something I write deriving
from
> AbstractField that returns the TokenStream for tokenStreamValue(), and
> nothing for stringValue() or readerValue().
> 
> Now if I also want to store a value for that field, do I just add a
different field
> with different options (eg stored=true, and the field a normal Field)?
> 
> Do these two things conflict in any way?  Do I have to be careful about
the
> order in which I do them?  Or is it just a mildly weird API with no
lurking ill
> effects? :)
> 
> Also: I have been seeing various e-mails about changes to this API so I
assume
> it's all different in 4.0; if you want to take this opportunity to explain
that,
> please go ahead, but for now I am working with the 3.x API.
> 
> Thanks
> 
> -Mike Sokolov
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org