How can I remove from time to time, because for the script snapcleaner I just
have the option to delete last day ???
thanks a lot Noble and sorry again for all this question,
Noble Paul നോബിള് नोब्ळ् wrote:
The hardlinks will prevent the unused files from getting cleaned up.
So the
Marc,
I don't have a Multicore setup that's itching for better logging, but I
think what you are suggesting is good. If I had a multicore setup I might want
either separate logs or the option to log the core name. Perhaps an
Enhancement type JIRA entry is in order?
Otis --
Sematext --
Hi,
snapcleaner lets you delete snapshots by one of the following two criteria:
- delete all but last N snapshots
- delete all snapshots older than N days
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
From: sunnyfr johanna...@gmail.com
To:
On Tue, Feb 17, 2009 at 1:10 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
Right. But I was trying to point out that a single 150MB Document is not
in fact what the o.p. wants to do. For example, if your 150MB represents,
say, a whole book, should that really be a single document?
Hi all,
I have been experimenting solr faceted search for 2 weeks. But I meet
performance limitation on facet Search.
My solr contains 4,000,000 documents. Normal searching is fairly fast, But
faceted search is extremely slow.
I am trying to do facet search on 3 fields (all multivalued fields) in
Have you tired with a nightly build with the new facet algorithm (it is
activated by default)?
http://www.nabble.com/new-faceting-algorithm-td20674902.html
Wang Guangchen wrote:
Hi all,
I have been experimenting solr faceted search for 2 weeks. But I meet
performance limitation on facet
I was looking for such a tool and haven't found it yet.
Using StandardAnalyzer one can obtain some form of token-stream which
can be used for agnostic analysis.
Clearly, then, something that matches words in a dictionary and
decides on the language based on the language of the majority could
Nope, I am using the latest stable version of solr 1.3.0.
Thanks for your tips.
Besides this, Is there any other thing I should do? I am reading some
previous threads about index optimization. (
http://www.mail-archive.com/solr-user@lucene.apache.org/msg05290.html), Will
it improve the facet
Paul Libbrecht schrieb:
Clearly, then, something that matches words in a dictionary and decides
on the language based on the language of the majority could do a decent
job to decide the analyzer.
Does such a tool exist?
I once played around with http://ngramj.sourceforge.net/ for language
Well doing an optimization after you do indexing will always improve your
search speed a little bit. But with the new facet algorithm you will note a
huge improvement ...
Other things to consider is to just index and store the necessary fields,
omitNorms always that is possible... there are many
Thank you very much.
On Tue, Feb 17, 2009 at 6:04 PM, Marc Sturlese marc.sturl...@gmail.comwrote:
Well doing an optimization after you do indexing will always improve your
search speed a little bit. But with the new facet algorithm you will note a
huge improvement ...
Other things to
Hi,
I'm trying to write some code to build a facet list for a date field,
but I don't know what the first and last available dates are. I would
adjust the gap param accordingly. If there is a 10yr stretch between
min(date) and max(date) I'd want to facet by year. If it is a 1 month
gap, I'd
Does Apache Tika help find the language of the given document?
On 2/17/09, Till Kinstler kinst...@gbv.de wrote:
Paul Libbrecht schrieb:
Clearly, then, something that matches words in a dictionary and decides on
the language based on the language of the majority could do a decent job to
On Tue, Feb 17, 2009 at 4:42 PM, Steffen B. s.baumg...@fhtw-berlin.dewrote:
Unfortunately, this rollback does not refill the index with the old data,
and neither keeps the old index from being overwritten with the new,
erroneous index. Now my question is: is there anything I can do to keep
may be you can try postImportDeleteQuery (not yet documented ,
SOLR-801) on a root entity.
You can keep a timestamp in the fields which can keep the value of
${dataimporter.index_start_time} as a field . Use that to remove old
docs which may exist in the index before the indexing started
--Noble
Hey, I have 2 problems that I think are really important and can be useful
for other users:
1.) I am runing 3 cores in a solr instance. Each core contains about a
milion and a half docs. Once a full-import is run in a core it will free
just a little bit of java memory. Once that first
Hi,
No, Tika doesn't do LangID. I haven't used ngramj, so I can't speak for its
accuracy nor speed (but I know the code has been around for years). Another
LangID implementation is at the URL below my name.
Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
CharFilter can normalize (convert) traditional chinese to simplified
chinese or vice versa,
if you define mapping.txt. Here is the sample of Chinese character
normalization:
https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG
See SOLR-822 for the detail:
Hmm, Otis, very nice!
Koji
Otis Gospodnetic wrote:
Hi,
Wouldn't this be as easy as:
- split email into paragraphs
- for each paragraph compute signature (MD5 or something fuzzier, like in
SOLR-799)
- for each signature look for other emails with this signature
- when you find an email with
It *looks* as though Solr supports returning the results of arbitrary
calculations:
http://wiki.apache.org/solr/SolrQuerySyntax
However, I am so far unable to get any example working except in the
context of a dismax bf. It seems like one ought to be able to write a
query to return the doc
On Mon, Feb 16, 2009 at 3:22 PM, Fergus McMenemie fer...@twig.me.uk wrote:
2) Having used TemplateTransformer to assign a value to an
entity column that column cannot be used in other
TemplateTransformer operations. In my project I am
attempting to reuse x.fileWebPath. To fix
A sanpshot is created every time snapshooter is invoked even if there is no
changed in the index. However, since snapshots are created using hard
links, no additional space is used if there are no changed to the index. It
does use up one directory entry in the data directory.
Bill
On Mon, Feb
Snapshots are created using hard links. So even though it is as big as the
index, it is not taking up any more space on the disk. The size of the
snapshot will change as the size of the index changes.
Bill
On Mon, Feb 16, 2009 at 9:50 AM, sunnyfr johanna...@gmail.com wrote:
It change a lot
usage: snapcleaner -D days | -N num [-d dir] [-u username] [-v]
-D days cleanup snapshots more than days days old
-N numkeep the most recent num number of snapshots and
cleanup up the remaining ones that are not being pulled
-d specify
I run snapcleaner from cron. That cleans up old snapshots once
each day. Here is a crontab line that runs it at 30 minutes past
the hour, every hour.
30 * * * * /apps/wss/solr_home/bin/snapcleaner -N 3
wunder
On 2/17/09 7:23 AM, Bill Au bill.w...@gmail.com wrote:
usage: snapcleaner -D days |
Requesting 5000 rows will use a lot of server time, because
it has to fetch the information for 5000 results when it
makes the response.
It is much more efficient to request only the results you
will need, usually 10 at a time.
wunder
On 2/17/09 3:30 AM, Jana, Kumar Raja kj...@ptc.com wrote:
Hello,
We are indexing information from diferent sources so we would like to
centralize the information content so i can retrieve using the ID
provided buy solr?
Does anyone did something like this, and have some advices ? I
thinking in store the information into a database like mysql ?
Thanks,
Hi Otis,
But this is not freeware ,right?
On 2/17/09, Otis Gospodnetic otis_gospodne...@yahoo.com wrote:
Hi,
No, Tika doesn't do LangID. I haven't used ngramj, so I can't speak for
its accuracy nor speed (but I know the code has been around for
years). Another LangID implementation is
Sure, we are doing essentially that with our Drupal integration module
- each search result contains a link to the real content, which is
stored in MySQL, etc, and presented via the Drupal CMS.
http://drupal.org/project/apachesolr
-Peter
On Tue, Feb 17, 2009 at 11:57 AM, roberto
Jana, Kumar Raja wrote:
2. If I set SolrQuery.setTimeAllowed(2000) Will this kill query
processing after 2 secs? (I know this question sounds silly but I just
want a confirmation from the experts J
That is the idea, but only some of the code is within the timer. So,
there are cases
A common approach (for web search engines) is to use HBase [1] as a
Document Repository. Each document indexed inside Solr will have an
entry (row, identified by the document URL) in the HBase table. This
works great when you deal with a large data collection (it scales better
than a SQL
There are a number of options for freeware here, just do some
searching on your favorite Internet search engine.
TextCat is one of the more popular, as I seem to recall:
http://odur.let.rug.nl/~vannoord/TextCat/
I believe Karl Wettin submitted a Lucene patch for a Language guesser:
On 2/17/09 12:26 PM, Grant Ingersoll gsing...@apache.org wrote:
If purchasing, several companies offer solutions, but I don't know
that their quality is any better than what you can get through open
source, as generally speaking, the problem is solved with a high
degree of accuracy through
Preface: This is my first attempt at using solr.
What happens if I need to do a change to a solr schema that's already
in production? Can fields be added or removed?
Can a type change from an integer to a float?
Thanks in advance,
Jon
--
Jonathan Haddad
http://www.rustyrazorblade.com
Preface: This is my first attempt at using solr.
What happens if I need to do a change to a solr schema that's already
in production? Can fields be added or removed?
Can a type change from an integer to a float?
Thanks in advance,
Jon
This is a straightforward question, but I haven't been able to figure out
what is up with my application.
I seem to be able to search on trailing wildcards just find. For example,
fieldName:a* will return documents with apple, ardvaark, etc. in them. But
if I was to try and search on a field
I'm using the DataImportHandler to load data. I created a custom row
transformer, and inside of it I'm reading a configuration file. I am using
the system's solr.solr.home property to figure out which directory the file
should be in. That works for a single-core deployment, but not for
multi-core
On Wed, Feb 18, 2009 at 5:53 AM, wojtekpia wojte...@hotmail.com wrote:
Is there a clean way to resolve the actual
conf directory path from within a custom row transformer so that it works
for both single-core and multi-core deployments?
You can use Context.getSolrCore().getInstanceDir()
--
On Wed, Feb 18, 2009 at 3:37 AM, Jonathan Haddad j...@letsgetnuts.com wrote:
Preface: This is my first attempt at using solr.
What happens if I need to do a change to a solr schema that's already
in production? Can fields be added or removed?
you may need a core reload or a serverrestart
Hi,
I want to store normalized data into Solr, example am spliting
personal information datas(fname, lname, mname) as one solr record, Address
(personal, office) as another record in Solr. the id is different
123212_name, 123212_add,
Now, some case i require both personal and
Thanks wunder for the response.
So I would like to know if I were to limit the resultset from Solr to 10
and my query actually matches, say 1000 documents, will the query
processing stop the moment the search finds the first 10 documents? Or
will the entire search be carried out and then sorted
Thanks Sean. That clears up the timer concept.
Is there any other way through which I can make sure that the server
time is not wasted?
-Original Message-
From: Sean Timm [mailto:tim...@aol.com]
Sent: Wednesday, February 18, 2009 1:00 AM
To: solr-user@lucene.apache.org
Subject: Re:
Hi,
There are no entity relationships in Solr and there are no joins, so the
simplest thing to do in this case is to issue two requests. You could also
write a custom SearchComponent that internally does two requests and returns a
single unified response.
Otis
--
Sematext --
Jim,
Does app*l or even a*p* work? Perhaps apple gets stemmed to something that
doesn't end in e, such as appl?
Regarding your config, you probably want to lowercase before removing stop
words, so you'll want to change the order of those filters a bit. That's not
related to your wildcard
44 matches
Mail list logo