Re: delete snapshot??

2009-02-17 Thread sunnyfr
How can I remove from time to time, because for the script snapcleaner I just have the option to delete last day ??? thanks a lot Noble and sorry again for all this question, Noble Paul നോബിള്‍ नोब्ळ् wrote: The hardlinks will prevent the unused files from getting cleaned up. So the

Re: dealing with logs - feature advice based on a use case

2009-02-17 Thread Otis Gospodnetic
Marc, I don't have a Multicore setup that's itching for better logging, but I think what you are suggesting is good.  If I had a multicore setup I might want either separate logs or the option to log the core name.  Perhaps an Enhancement type JIRA entry is in order? Otis -- Sematext --

Re: delete snapshot??

2009-02-17 Thread Otis Gospodnetic
Hi, snapcleaner lets you delete snapshots by one of the following two criteria: - delete all but last N snapshots - delete all snapshots older than N days Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch From: sunnyfr johanna...@gmail.com To:

Re: Outofmemory error for large files

2009-02-17 Thread Shalin Shekhar Mangar
On Tue, Feb 17, 2009 at 1:10 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Right. But I was trying to point out that a single 150MB Document is not in fact what the o.p. wants to do. For example, if your 150MB represents, say, a whole book, should that really be a single document?

Facet search on Multi-Valued Fields

2009-02-17 Thread Wang Guangchen
Hi all, I have been experimenting solr faceted search for 2 weeks. But I meet performance limitation on facet Search. My solr contains 4,000,000 documents. Normal searching is fairly fast, But faceted search is extremely slow. I am trying to do facet search on 3 fields (all multivalued fields) in

Re: Facet search on Multi-Valued Fields

2009-02-17 Thread Marc Sturlese
Have you tired with a nightly build with the new facet algorithm (it is activated by default)? http://www.nabble.com/new-faceting-algorithm-td20674902.html Wang Guangchen wrote: Hi all, I have been experimenting solr faceted search for 2 weeks. But I meet performance limitation on facet

Re: Multilanguage

2009-02-17 Thread Paul Libbrecht
I was looking for such a tool and haven't found it yet. Using StandardAnalyzer one can obtain some form of token-stream which can be used for agnostic analysis. Clearly, then, something that matches words in a dictionary and decides on the language based on the language of the majority could

Re: Facet search on Multi-Valued Fields

2009-02-17 Thread Wang Guangchen
Nope, I am using the latest stable version of solr 1.3.0. Thanks for your tips. Besides this, Is there any other thing I should do? I am reading some previous threads about index optimization. ( http://www.mail-archive.com/solr-user@lucene.apache.org/msg05290.html), Will it improve the facet

Re: Multilanguage

2009-02-17 Thread Till Kinstler
Paul Libbrecht schrieb: Clearly, then, something that matches words in a dictionary and decides on the language based on the language of the majority could do a decent job to decide the analyzer. Does such a tool exist? I once played around with http://ngramj.sourceforge.net/ for language

Re: Facet search on Multi-Valued Fields

2009-02-17 Thread Marc Sturlese
Well doing an optimization after you do indexing will always improve your search speed a little bit. But with the new facet algorithm you will note a huge improvement ... Other things to consider is to just index and store the necessary fields, omitNorms always that is possible... there are many

Re: Facet search on Multi-Valued Fields

2009-02-17 Thread Wang Guangchen
Thank you very much. On Tue, Feb 17, 2009 at 6:04 PM, Marc Sturlese marc.sturl...@gmail.comwrote: Well doing an optimization after you do indexing will always improve your search speed a little bit. But with the new facet algorithm you will note a huge improvement ... Other things to

Finding total range of dates for date faceting

2009-02-17 Thread Jacob Singh
Hi, I'm trying to write some code to build a facet list for a date field, but I don't know what the first and last available dates are. I would adjust the gap param accordingly. If there is a 10yr stretch between min(date) and max(date) I'd want to facet by year. If it is a 1 month gap, I'd

Re: Multilanguage

2009-02-17 Thread revathy arun
Does Apache Tika help find the language of the given document? On 2/17/09, Till Kinstler kinst...@gbv.de wrote: Paul Libbrecht schrieb: Clearly, then, something that matches words in a dictionary and decides on the language based on the language of the majority could do a decent job to

Re: DIH full-import with clean=true fails and rollback empties index

2009-02-17 Thread Shalin Shekhar Mangar
On Tue, Feb 17, 2009 at 4:42 PM, Steffen B. s.baumg...@fhtw-berlin.dewrote: Unfortunately, this rollback does not refill the index with the old data, and neither keeps the old index from being overwritten with the new, erroneous index. Now my question is: is there anything I can do to keep

Re: DIH full-import with clean=true fails and rollback empties index

2009-02-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
may be you can try postImportDeleteQuery (not yet documented , SOLR-801) on a root entity. You can keep a timestamp in the fields which can keep the value of ${dataimporter.index_start_time} as a field . Use that to remove old docs which may exist in the index before the indexing started --Noble

2 strange behaviours with DIH full-import.

2009-02-17 Thread Marc Sturlese
Hey, I have 2 problems that I think are really important and can be useful for other users: 1.) I am runing 3 cores in a solr instance. Each core contains about a milion and a half docs. Once a full-import is run in a core it will free just a little bit of java memory. Once that first

Re: Multilanguage

2009-02-17 Thread Otis Gospodnetic
Hi, No, Tika doesn't do LangID.  I haven't used ngramj, so I can't speak for its accuracy nor speed (but I know the code has been around for years).  Another LangID implementation is at the URL below my name. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

Re: indexing Chienese langage

2009-02-17 Thread Koji Sekiguchi
CharFilter can normalize (convert) traditional chinese to simplified chinese or vice versa, if you define mapping.txt. Here is the sample of Chinese character normalization: https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG See SOLR-822 for the detail:

Re: Word Locations Search Components

2009-02-17 Thread Koji Sekiguchi
Hmm, Otis, very nice! Koji Otis Gospodnetic wrote: Hi, Wouldn't this be as easy as: - split email into paragraphs - for each paragraph compute signature (MD5 or something fuzzier, like in SOLR-799) - for each signature look for other emails with this signature - when you find an email with

Re: Finding total range of dates for date faceting

2009-02-17 Thread Peter Wolanin
It *looks* as though Solr supports returning the results of arbitrary calculations: http://wiki.apache.org/solr/SolrQuerySyntax However, I am so far unable to get any example working except in the context of a dismax bf. It seems like one ought to be able to write a query to return the doc

Re: DIH transformers - sect 2

2009-02-17 Thread Fergus McMenemie
On Mon, Feb 16, 2009 at 3:22 PM, Fergus McMenemie fer...@twig.me.uk wrote: 2) Having used TemplateTransformer to assign a value to an entity column that column cannot be used in other TemplateTransformer operations. In my project I am attempting to reuse x.fileWebPath. To fix

Re: snapshot created if there is no documente updated/new?

2009-02-17 Thread Bill Au
A sanpshot is created every time snapshooter is invoked even if there is no changed in the index. However, since snapshots are created using hard links, no additional space is used if there are no changed to the index. It does use up one directory entry in the data directory. Bill On Mon, Feb

Re: snapshot as big as the index folder?

2009-02-17 Thread Bill Au
Snapshots are created using hard links. So even though it is as big as the index, it is not taking up any more space on the disk. The size of the snapshot will change as the size of the index changes. Bill On Mon, Feb 16, 2009 at 9:50 AM, sunnyfr johanna...@gmail.com wrote: It change a lot

Re: delete snapshot??

2009-02-17 Thread Bill Au
usage: snapcleaner -D days | -N num [-d dir] [-u username] [-v] -D days cleanup snapshots more than days days old -N numkeep the most recent num number of snapshots and cleanup up the remaining ones that are not being pulled -d specify

Re: delete snapshot??

2009-02-17 Thread Walter Underwood
I run snapcleaner from cron. That cleans up old snapshots once each day. Here is a crontab line that runs it at 30 minutes past the hour, every hour. 30 * * * * /apps/wss/solr_home/bin/snapcleaner -N 3 wunder On 2/17/09 7:23 AM, Bill Au bill.w...@gmail.com wrote: usage: snapcleaner -D days |

Re: Query regarding setTimeAllowed(Integer) and setRows(Integer)

2009-02-17 Thread Walter Underwood
Requesting 5000 rows will use a lot of server time, because it has to fetch the information for 5000 results when it makes the response. It is much more efficient to request only the results you will need, usually 10 at a time. wunder On 2/17/09 3:30 AM, Jana, Kumar Raja kj...@ptc.com wrote:

Store content out of solr

2009-02-17 Thread roberto
Hello, We are indexing information from diferent sources so we would like to centralize the information content so i can retrieve using the ID provided buy solr? Does anyone did something like this, and have some advices ? I thinking in store the information into a database like mysql ? Thanks,

Re: Multilanguage

2009-02-17 Thread revathy arun
Hi Otis, But this is not freeware ,right? On 2/17/09, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, No, Tika doesn't do LangID. I haven't used ngramj, so I can't speak for its accuracy nor speed (but I know the code has been around for years). Another LangID implementation is

Re: Store content out of solr

2009-02-17 Thread Peter Wolanin
Sure, we are doing essentially that with our Drupal integration module - each search result contains a link to the real content, which is stored in MySQL, etc, and presented via the Drupal CMS. http://drupal.org/project/apachesolr -Peter On Tue, Feb 17, 2009 at 11:57 AM, roberto

Re: Query regarding setTimeAllowed(Integer) and setRows(Integer)

2009-02-17 Thread Sean Timm
Jana, Kumar Raja wrote: 2. If I set SolrQuery.setTimeAllowed(2000) Will this kill query processing after 2 secs? (I know this question sounds silly but I just want a confirmation from the experts J That is the idea, but only some of the code is within the timer. So, there are cases

Re: Store content out of solr

2009-02-17 Thread Renaud Delbru
A common approach (for web search engines) is to use HBase [1] as a Document Repository. Each document indexed inside Solr will have an entry (row, identified by the document URL) in the HBase table. This works great when you deal with a large data collection (it scales better than a SQL

Re: Multilanguage

2009-02-17 Thread Grant Ingersoll
There are a number of options for freeware here, just do some searching on your favorite Internet search engine. TextCat is one of the more popular, as I seem to recall: http://odur.let.rug.nl/~vannoord/TextCat/ I believe Karl Wettin submitted a Lucene patch for a Language guesser:

Re: Multilanguage

2009-02-17 Thread Walter Underwood
On 2/17/09 12:26 PM, Grant Ingersoll gsing...@apache.org wrote: If purchasing, several companies offer solutions, but I don't know that their quality is any better than what you can get through open source, as generally speaking, the problem is solved with a high degree of accuracy through

making changes to solr schema

2009-02-17 Thread Jonathan Haddad
Preface: This is my first attempt at using solr. What happens if I need to do a change to a solr schema that's already in production? Can fields be added or removed? Can a type change from an integer to a float? Thanks in advance, Jon -- Jonathan Haddad http://www.rustyrazorblade.com

making changes to solr schema after deployed to production

2009-02-17 Thread Jonathan Haddad
Preface: This is my first attempt at using solr. What happens if I need to do a change to a solr schema that's already in production? Can fields be added or removed? Can a type change from an integer to a float? Thanks in advance, Jon

embedded wildcard search not working?

2009-02-17 Thread Jim Adams
This is a straightforward question, but I haven't been able to figure out what is up with my application. I seem to be able to search on trailing wildcards just find. For example, fieldName:a* will return documents with apple, ardvaark, etc. in them. But if I was to try and search on a field

Reading Core-Specific Config File in a Row Transformer

2009-02-17 Thread wojtekpia
I'm using the DataImportHandler to load data. I created a custom row transformer, and inside of it I'm reading a configuration file. I am using the system's solr.solr.home property to figure out which directory the file should be in. That works for a single-core deployment, but not for multi-core

Re: Reading Core-Specific Config File in a Row Transformer

2009-02-17 Thread Shalin Shekhar Mangar
On Wed, Feb 18, 2009 at 5:53 AM, wojtekpia wojte...@hotmail.com wrote: Is there a clean way to resolve the actual conf directory path from within a custom row transformer so that it works for both single-core and multi-core deployments? You can use Context.getSolrCore().getInstanceDir() --

Re: making changes to solr schema

2009-02-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, Feb 18, 2009 at 3:37 AM, Jonathan Haddad j...@letsgetnuts.com wrote: Preface: This is my first attempt at using solr. What happens if I need to do a change to a solr schema that's already in production? Can fields be added or removed? you may need a core reload or a serverrestart

Data Normalization in Solr.

2009-02-17 Thread Kalidoss MM
Hi, I want to store normalized data into Solr, example am spliting personal information datas(fname, lname, mname) as one solr record, Address (personal, office) as another record in Solr. the id is different 123212_name, 123212_add, Now, some case i require both personal and

RE: Query regarding setTimeAllowed(Integer) and setRows(Integer)

2009-02-17 Thread Jana, Kumar Raja
Thanks wunder for the response. So I would like to know if I were to limit the resultset from Solr to 10 and my query actually matches, say 1000 documents, will the query processing stop the moment the search finds the first 10 documents? Or will the entire search be carried out and then sorted

RE: Query regarding setTimeAllowed(Integer) and setRows(Integer)

2009-02-17 Thread Jana, Kumar Raja
Thanks Sean. That clears up the timer concept. Is there any other way through which I can make sure that the server time is not wasted? -Original Message- From: Sean Timm [mailto:tim...@aol.com] Sent: Wednesday, February 18, 2009 1:00 AM To: solr-user@lucene.apache.org Subject: Re:

Re: Data Normalization in Solr.

2009-02-17 Thread Otis Gospodnetic
Hi, There are no entity relationships in Solr and there are no joins, so the simplest thing to do in this case is to issue two requests.  You could also write a custom SearchComponent that internally does two requests and returns a single unified response. Otis -- Sematext --

Re: embedded wildcard search not working?

2009-02-17 Thread Otis Gospodnetic
Jim, Does app*l or even a*p* work?  Perhaps apple gets stemmed to something that doesn't end in e, such as appl? Regarding your config, you probably want to lowercase before removing stop words, so you'll want to change the order of those filters a bit.  That's not related to your wildcard