AndrzejBialecki to the ContributorsGroup. Thanks!
--
Best regards,
Andrzej Bialecki
http://www.sigram.com, blog http://www.sigram.com/blog
___.,___,___,___,_._. __
[___||.__|__/|__||\/|: Information Retrieval, System Integration
___|||__||..\|..||..|: Contact
the reading. :)
--
Best regards,
Andrzej Bialecki
http://www.sigram.com, blog http://www.sigram.com/blog
___.,___,___,___,_._. __
[___||.__|__/|__||\/|: Information Retrieval, System Integration
___|||__||..\|..||..|: Contact: info at sigram dot com
positions and offsets if available (or
blanks if not available).
--
Best regards,
Andrzej Bialecki
http://www.sigram.com, blog http://www.sigram.com/blog
___.,___,___,___,_._. __
[___||.__|__/|__||\/|: Information Retrieval, System Integration
fields or in an
external system.
--
Best regards,
Andrzej Bialecki
http://www.sigram.com, blog http://www.sigram.com/blog
___.,___,___,___,_._. __
[___||.__|__/|__||\/|: Information Retrieval, System Integration
___|||__||..\|..||..|: Contact: info at sigram
is whether the space savings would be worth the complication?
--
Best regards,
Andrzej Bialecki
http://www.sigram.com, blog http://www.sigram.com/blog
___.,___,___,___,_._. __
[___||.__|__/|__||\/|: Information Retrieval, System Integration
implementation must be inherited from
FSDirectory (mitja.lenic)
* Issue 21: luke tarball needs to extract to a luke directory
(bevan.koopman, Photodeus)
* Issue 27: Cannot add or edit documents using StandardAnalyzer
(dean.thrasher)
Thanks to all contributors. Enjoy!
--
Best regards,
Andrzej
pairs.
--
Best regards,
Andrzej Bialecki
http://www.sigram.com, blog http://www.sigram.com/blog
___.,___,___,___,_._. __
[___||.__|__/|__||\/|: Information Retrieval, System Integration
___|||__||..\|..||..|: Contact: info at sigram dot com
.
LUCENE-3837, to be specific. But as you said, it's still early and there
is no code yet to speak of...
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded
is lower than the current lowest score.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram
On 29/03/2012 11:14, Andrzej Bialecki wrote:
The problem in our implementation is that we use a within-document term
frequency (the number of occurrences of t in the current document) and
not a collection-wide term frequency... so, it looks to me that the fix
would be to first fully traverse
enumeration and calculate the
total number of term occurrences in all documents (e.g. in
RIDFTermPruningPolicy.initPositionsTerm(..) ), and use this value in the
formula in place of termPositions.freq().
--
Best regards,
Andrzej Bialecki
regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
, and then see if it's good enough.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot
and a happy New Year to you all! :)
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot
that supports Unicode characters, the
default platform font often doesn't support them, which results in '?'
or other strange characters.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
bit-level distance in their hashes from the query hash.
The solution is described in SOLR-1918 - Bit-wise scoring field type.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
On 31/10/2011 21:42, Petite Abeille wrote:
On Oct 31, 2011, at 9:32 PM, Andrzej Bialecki wrote:
similarity-preserving hash function was calculated on each sentence, and the
hash was added as a field. The property of the hash was that similar documents
(sentences) would produce a similar
.
* Rearranged field flags so that they are more logical and cover
index options added in 3.4.0. E.g. omitNorms is represented as
with Norms and marked by N, IndexOptions are expanded to Idfp
to mark indexed fields with docs, freqs and positions.
Enjoy!
--
Best regards,
Andrzej Bialecki
. There's probably some
lesson to learn from this situation...
I committed a fix, and the updated release is marked as 3.4.0_1. Sorry!
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
Hi all,
Luke 3.3.0 has been released and is available for download here:
http://code.google.com/p/luke/
Apart from the updated Lucene libraries there were no changes in
functionality.
--
Best regards,
Andrzej Bialecki
modify norms directly using IndexReader.setNorm(...) but
you need to remember that this method uses raw byte values, that is the
result of encoding a floating point value with
Similarity.encodeNormValue(..).
--
Best regards,
Andrzej Bialecki
://people.ischool.berkeley.edu/~hearst/research/tilebars.html
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram
, patches and comments.
Happy Luke-ing!
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram
obtained from the full index, and then you use
this map to calculate IDF.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http
On 2010-07-07 14:49, Naveen Kumar wrote:
Hi Andrzej Bialecki
When you suggested -
There are some other low-level ways to do this, but the easiest is to
use a FilterIndexReader, especially since you just want to add a
stored
field - implement a subclass of FilterIndexReader
. You
have been warned :)
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
the Reconstruct
Edit functionality in Luke (http://www.getopt.org/luke).
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com
in the
output index.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
On 2010-05-31 10:54, Uwe Schindler wrote:
No.
See also LUCENE-2048 (nice round number ;) ).
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System
need such kind of access in your application then add your
documents with term vectors with offsets and positions. Even then,
depending on the Analyzer you used, the process is lossy - some input
data that was discarded by Analyzer is simply no longer available.
--
Best regards,
Andrzej Bialecki
that?
Yes, see the discussion here:
https://issues.apache.org/jira/browse/LUCENE-2393
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System
will need twice as much space. But in this case perhaps you could
put the original index on a network FS, and split it into the target
partition - the data would be read just once.
--
Best regards,
Andrzej Bialecki
plugin (and analyzers) don't work.
* Issue 4 : Compress flag no longer available.
* Issue 14 : Error while using custom similarity.
Enjoy!
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
.
I'll commit the current mostly-working state today, you can take a look
- you've written some cool Luke plugins before .. ;)
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
could store such information in
IndexCommit.getUserData(). The lack of standardized metadata is an
issue, of course - we could start experimenting with this in Luke, to
see whether we can squeeze a subset of Solr schema there.
--
Best regards,
Andrzej Bialecki
this parser out of the box. I
expect to make a release within a few days. Watch the commits on the
Google code project ...
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
).
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
implement your own Fieldable, and return what you want
from its methods. You can also use Field constructor that takes the
stored value, and then use Field.setTokenStream(TokenStream) - it
doesn't override the stored value.
--
Best regards,
Andrzej Bialecki
Lucene 2.9.1 and 3.0.
Your feedback is welcome - please use the Google Issue tracker to report
issues.
Merry Christmas!
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
and if not existent fall back to the zero-arg ctor.
I'll open an issue.
Indeed - thanks!
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http
that this encoding causes
(and what input values effectively come out the same, once encoded).
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
to edit per-commit user data Map
Bug fixes
-
* Term frequency vectors were not displayed for selected field.
Enjoy!
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
create other fields in the document (or split this token stream
into several fields).
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System
are hardcoded somewhere deep in Thinlet,
but likely they could be made configurable.
You can find an EPS version of the Lucene logo here:
http://lucene.apache.org/images/logo.eps
--
Best regards,
Andrzej Bialecki
) that if the terms
you load are indexed that'll help. But this is mostly
a guess.
Just to clarify: IndexReader.document(doc) and .document(doc, selector)
load _only_ stored fields, they don't interact at all with the
terms-related part of Lucene..
--
Best regards,
Andrzej Bialecki
Andrzej Bialecki wrote:
Hi all,
I'm happy to announce the new release of Luke - the Lucene Index Toolbox.
There's a bug in this version in that it doesn't show TermVectors for a
field. I'll fix it in a few days - I'm waiting for other potential bugs
to show up. So if you find something
! :)
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
/Downloads/LucidGaze-for-Lucene
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
, September 3rd 2009
11:00-11:30AM PDT / 14:00-14:30 EDT
Follow this link to sign up:
http://www2.eventsvc.com/lucidimagination/event/ff97623d-3fd5-43ba-a69d-650dcb1d6bbc?trk=WR-SEP2009-AP
About:
Lucene Performance Workshop:
Understanding Lucene Search Performance
with Andrzej Bialecki
Experienced
. analyzed tokens in the field should become apparent.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info
that this is an early
preview. Also, various UI glitches are probably related to the Thinlet
toolkit - again, one day I may re-write Luke using something else, but
for now I don't have the strength to do it. :)
--
Best regards,
Andrzej Bialecki
regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
://www.getopt.org/luke) can
export all stored fields from all documents into an XML file.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http
.
(Actually: does CheckIndex warn about unused files in the index directory
so people can clean them up? i'm not sure)
It doesn't. But Luke has a function to do this.
--
Best regards,
Andrzej Bialecki
Andrzej Bialecki wrote:
(sorry for cross-posting)
Hi all,
I'm happy to announce a new release of Luke, the Lucene Index Toolbox.
As usually, you can obtain it from here:
http://www.getopt.org/luke
If you tried to access this url during last couple hours the site was
down. It should
an arbitrary re-sorting of top-N
results, according to your rules of preference (business rules, or
heuristics). This way you can avoid the overfitting or doing endless
tweaking, and still get the ranking that makes sense to your users.
--
Best regards,
Andrzej Bialecki
per field in Overview - contributed
by Mark Harwood.
o Improved the Analysis plugin to show all token information,
and highlight whenever a token is selected from the list.
* Bug fixes:
o (None)
--
Best regards,
Andrzej Bialecki
to the classpath when you start Luke.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram
liat oren wrote:
Ok, thanks.
I will have to edit the code of Luke in order to add another analyzer,
right?
No - if your analyzer is already on the classpath, then it's enough to
type in the fully qualified class name in the drop down box (it's editable).
--
Best regards,
Andrzej Bialecki
Hi all,
I apologize for the inconvenience - the site went down without any prior
notice from the ISP. I'm investigating the issue ...
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
be messy - it's better to propose that this information should
be added to API.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http
, the
search worked fine using 2.4.
Any ideas why this is happening.
No idea - but perhaps this is somehow related:
https://issues.apache.org/jira/browse/LUCENE-1452
--
Best regards,
Andrzej Bialecki
of Lucene involved.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
it. (with a score etc)
I can see the case for this would be a news-article and several people
writing queries to get alerted if it matched a certain condition.
http://www.seas.upenn.edu/~svilen/publications/subscribe.pdf
--
Best regards,
Andrzej Bialecki
commits option was specified. Reported by Mark Harwood.
o Empty index with no fields was reported as invalid. Discovered by
Andrew Zhang and Michael McCandless (LUCENE-1454).
Thank you!
--
Best regards,
Andrzej Bialecki
other places ... I forgot about the use
of IndexFileDeleter - and indeed passing the read-only flag here can
solve this, because then I can always use KeepAllDeletionPolicy when
opening read-only.
Thanks for the report!
--
Best regards,
Andrzej Bialecki
IndexReader.undeleteAll().
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
format, incompatible with earlier versions of
Lucene (including 2.4 release).
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http
Andrzej Bialecki wrote:
1) Luke 2.4 release. This has the advantage of being an official stable
[...]
2) Luke 2.9-dev snapshot. This has the advantage that you get the
[...]
Of course I meant Lucene 2.4 and Lucene 2.9-dev ... sorry for the confusion.
--
Best regards,
Andrzej Bialecki
else does it it's simply not going to
happen. All code in Luke except for the Thinlet class is under Apache
License, so feel free to start coding :)
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval
this in the proposals for the next
summer.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram
it needs to read this info from the .ti file.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info
methods on Fieldable that test the
validity of flag combinations with particular version of Lucene?
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix
: ConjunctionScorer, lines 85-103 - pay attention to the
comments there, it's not strictly a sort by frequency, rather by the
sampled sparseness.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic
scoring and not afterwards. FilteredQuery internally
makes use of skipTo(), which should help to limit the number of
evaluated docs.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
are set now like this:
isIndexed = true;
isTokenized = true;
omitNorms = true;
The end result of processing such a field is (I believe) conceptually
equivalent to adding as many Fields as there are tokens, each with
omitNorms=true.
--
Best regards,
Andrzej Bialecki
Otis Gospodnetic wrote:
So in other words, it *is* possible to have the field both tokenized and its
norms omitted?
Yes. Probably this is an unintended side-effect of adding setOmitNorms,
but I think it's useful and IMHO we should keep it.
--
Best regards,
Andrzej Bialecki
the same thing in Case sensitivity thread - it's
possible to have a tokenized field and omit its norms.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix
regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
(org.apache.nutch.indexer.IndexSorter).
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
, which requires overriding other methods.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info
by doc.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Indices
... and quite a few other papers that I don't remember now ... please do
a search for distributed IR on ACM or Citeseer.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
;) Perhaps you could use a FilteredIndexReader to
maintain a map between new IDs and old IDs, and remap on the fly.
Although I think that some parts of Lucene depend on the fact that in a
normal index the IDs are monotonically increasing ... this would
complicate the issue.
--
Best regards,
Andrzej
something better,
AFAIK, PDFBox has a lower-level API that allows you to get hold of text
positions.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix
_and_ formatting from any
documents that could be normally opened with MS Office - however,
performance was an issue, ie. it was slow, CPU/memory hog, and
occasionally it would get stuck in a weird state when only complete
reboot would help.
--
Best regards,
Andrzej Bialecki
that
executes in a distributed fashion (not sure if map-reduce is the best
model here), but first copies the indexes to LocalFileSystem.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
there is to it :)
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
/ ram / fs for specific
index file types (e.g. tis, tii, fdt, prx and so on) - you should be
able to cut paste large chunks of each directory code to start the
implementation.
--
Best regards,
Andrzej Bialecki
is not available ... Luke populates this
screen using Document.getFields(). If a field is unstored then it's not
returned in this list, so it's not possible to get its flags.
--
Best regards,
Andrzej Bialecki
I'll include it in
a minor update.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram
regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
this column now reads Norms and shows the fieldNorm value of a field.
Have fun!
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http
regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
://wt.jrc.it/lt/Acquis/
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
to the on-disk index), and start using
the new IndexSearcher.
And again, start accumulating new docs in the RAMDirectory, etc, etc ...
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
it to the local filesystem first...
Yes - see org.apache.nutch.indexer.FsDirectory. However, you will not
like the performance, it's much slower than using the index locally.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information
to implement, yet produces useful results difficult to obtain through
the usual means (similarity, boosting, even function query).
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
?
(I'm not involved in Wikia development). There are some ways to go about
it even in the pure Lucene-land, so that the updates are fast without
reindexing the main content. Hint: ParallelReader.
--
Best regards,
Andrzej Bialecki
1 - 100 of 168 matches
Mail list logo