Tika0.10 language identifier in Solr3.5.0

2012-01-19 Thread nibing
Hi, all, I am using Solr3.5.0 which applies Tika0.10 to do language detection, and I have a couple of questions about this function. 1. I can see the outcome of the language detection in a field language_s. But what action will be taken according to the different language code? How to

Re: What is meantby score and maxscore solr

2012-01-19 Thread Rafał Kuć
Hello! Score of the document is a value calculated for a given document in the context of the given query showing how good the document 'fits' the query. The maxscore is the maximum score calculated by Lucene for a given query. Please have a look at

Re: Tika0.10 language identifier in Solr3.5.0

2012-01-19 Thread Alessio Crisantemi
Dear all, I can I do for indexing a complete directory with many pdf files on Solr? Alessio Crisantemi Direttore Responsabile Gioconews.it www.gioconews.it t: (+39)0744461296 f: (+39)0744461362 bb: (+39)3477939054 e: alessio.crisant...@gioconews.it

re: StreamingUpdateSolrServer - connection refused

2012-01-19 Thread Poulton, Gareth | Gareth | DU
Hi all, I’m having basically the exact same problem someone described in this email to the list from just over a year ago (see below). The only suggested solution given on the thread at the time was to ping the server before sending an add, which I’m not particularly keen on; least of all

Src code download url needed for SOLR 3.5

2012-01-19 Thread mechravi25
Hi, I would like to know if there is any way where in I can get the src code(baselined code) for SOLR 3.5 version. I got the code for 3.x version(3.6) only. Is there a possibility in eclipse to check out the 3.5 code from svn or is there a zip file available for the same? Thanks. -- View

Re: Src code download url needed for SOLR 3.5

2012-01-19 Thread Rafał Kuć
Hello! Please look at the one of the mirrors. There should be a package apache-solr-3.5.0-src.tgz which contain the source code. For example the following link should work: http://ftp.tpnet.pl/vol/d1/apache//lucene/solr/3.5.0/apache-solr-3.5.0-src.tgz -- Regards, Rafał Kuć Hi, I would

Searching partial phone numbers

2012-01-19 Thread marotosg
Hi. I have phone numbers in my solr schema in a field. At the moment i have this field as string. I would like to be able to make searches that find parts of a phone number. For instance: Number +35384589458 search by *+35384* or search by *84589*. Do you know if this is posible? Thanks

Re: Searching partial phone numbers

2012-01-19 Thread Patrick Plaatje
Hi Marotosg, you can index the phonenumber field with the ngram field type, which allows for partial (wildcard) searches on this field. Have a look here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.CommonGramsFilterFactory Cheers, Patrick 2012/1/19 marotosg

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-19 Thread Darren Govoni
I think the occassional Hey, we made something cool you might be interested in! notice, even if commercial, is ok because it addresses numerous issues we struggle with on this list. Now, if it were something completely off-base or unrelated (e.g. male enhancement pills), then yeah, I agree.

Re: How can a distributed Solr setup scale to TB-data, if URL limitations are 4000 for distributed shard search?

2012-01-19 Thread Daniel Bruegge
On Thu, Jan 19, 2012 at 4:51 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Huge is relative. ;) Huge Solr clusters also often have huge hardware. Servers with 16 cores and 32 GM RAM are becoming very common, for example. Another thing to keep in mind is that while lots of

Re: Question on Reverse Indexing

2012-01-19 Thread Dmitry Kan
Oh, I see, haven't noticed you used solr 4.0. luke can only read 3.5 at most, at the moment. So when you search with a leading wildcard, do both your app and the SOLR admin search give the same results? Probably you can show relevant parts of your schema and solrconfig? Like type(s) definition

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-19 Thread Patrick Plaatje
Partially agree. If just the facts are given, and not a complete sales talk instead, it'll be fine. Don't overdo it like this though. Cheers, Patrick 2012/1/19 Darren Govoni dar...@ontrenet.com I think the occassional Hey, we made something cool you might be interested in! notice, even if

RE: Question on Reverse Indexing

2012-01-19 Thread Shyam Bhaskaran
Dimitry, Yes my app and the Solr admin search results are giving me similar results. Excerpt from schema.xml: As you can see solr.ReversedWildcardFilterFactory is commented out. types fieldtype name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType

Re: conditional field weighting

2012-01-19 Thread Jan Høydahl
Hi, Try DisMax parser with tie parameter: q=solrqf=name^10.03 description^10.02 location^10.01tie=0.5defType=dismax What will happen now is that the field which scores HIGHEST for the term will win the max score (10). If all things are equal, name will win above because it has slightly higher

Re: index-time over boosted

2012-01-19 Thread Jan Høydahl
Hi, Can you paste exactly both fieldType and field definitions from your schema? omitNorms=true should kill norms. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 19. jan. 2012, at 08:18, remi tassing wrote: Hi, just a

Re: Tika0.10 language identifier in Solr3.5.0

2012-01-19 Thread Jan Høydahl
Hi, You may use the string as you choose, for instance filtering (fq=language_s:en) or for faceting (facet.field=language_s). What are you looking to do? What would you like to detect on the query side? The language of the search string? That is very hard since people type very few words into

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-19 Thread Darren Govoni
Agree. There's probably some unwritten etiquette there. On 01/19/2012 05:52 AM, Patrick Plaatje wrote: Partially agree. If just the facts are given, and not a complete sales talk instead, it'll be fine. Don't overdo it like this though. Cheers, Patrick 2012/1/19 Darren

Re: index-time over boosted

2012-01-19 Thread remi tassing
Hello Jan, My schema wasn't changed from the release 3.5.0. The content can be seen below: schema name=nutch version=1.1 types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=long class=solr.LongField

Re: Question on Reverse Indexing

2012-01-19 Thread Dmitry Kan
A quick immediate observation: first in the analysis and query chains you have some customer tokenizer factory. Could it, by some chance, affect on the leading wildcard setting? This setting does not require storing the reversed tokens in the index. It is just run-time leading wildcard expansion

Just can't get Solritas to work, help!

2012-01-19 Thread remi tassing
Hi, I tried everything I could, changed version but nada! Is there a working tutorial on how to make Nutch, Solr and Solritas work? Remi

Re: Just can't get Solritas to work, help!

2012-01-19 Thread remi tassing
I can get the error: HTTP ERROR 400 Problem accessing /solr/browse. Reason: undefined field cat -- *Powered by Jetty://* On Thu, Jan 19, 2012 at 2:44 PM, remi tassing tassingr...@gmail.com wrote: Hi, I tried everything I could, changed version but nada! Is

RE: Question on Reverse Indexing

2012-01-19 Thread Shyam Bhaskaran
Dimitry, Our custom tokenizer is similar to standard tokenizer, just for testing I changed my custorm tokenizer with solr.StandardTokenizerFactory and performed the index again - but still I observe the same behavior. We have build Solr Lucene 4.0 from the source code trunk. You say that

Re: first time query is very slow

2012-01-19 Thread Yonik Seeley
On Wed, Jan 18, 2012 at 10:15 PM, gabriel shen xshco...@gmail.com wrote: Hi Yonik, The index I am querying against is 20gb, containing 200,000documents, some of the documents are quite big, the schema contains more than 50 fields. Main content field are defined as both stored and indexed,

Re: Just can't get Solritas to work, help!

2012-01-19 Thread Erik Hatcher
/browse is defined solrconfig.xml. Its details need adjusting for datasets other than the example data that ships with Solr. Templates may also need adjusting, but does handle arbitrary facet fields automatically. Erik On Jan 19, 2012, at 7:56, remi tassing tassingr...@gmail.com wrote:

Re: Just can't get Solritas to work, help!

2012-01-19 Thread remi tassing
I think I get your point. Is there any solrconfig.xml sample that works with nutch in a default configuration? Just something to start play with Remi On Thu, Jan 19, 2012 at 3:02 PM, Erik Hatcher erik.hatc...@gmail.comwrote: /browse is defined solrconfig.xml. Its details need adjusting for

Re: Just can't get Solritas to work, help!

2012-01-19 Thread Nicholas Fellows
Heya, Question for you guys, Im trying to use the solr analysis.jsp tool to debug a solr query. I cant work out how to input sample data for the Field Value (Index) box when the data is multiValued. I was wondering if you could explain how to do this or point me to the documentation where this

Re: Just can't get Solritas to work, help!

2012-01-19 Thread remi tassing
Hey Nick, could you plz create a new thread? Remi On Thu, Jan 19, 2012 at 3:35 PM, Nicholas Fellows n...@djdownload.comwrote: Heya, Question for you guys, Im trying to use the solr analysis.jsp tool to debug a solr query. I cant work out how to input sample data for the Field Value

SolrPhpClient not working

2012-01-19 Thread jawedshamshedi
0 down vote favorite share [fb] share [tw] I have a question. SolrphpClient is not working with multicore. I have two cores in my solr say core1 and core2. While creating object of SolrPhpClient I am using the following syntax. $solr = new Apache_Solr_Service('192.168.12.226',

Re: index-time over boosted

2012-01-19 Thread Jan Høydahl
Hi, The schema you pasted in your mail is NOT Solr3.5's default example schema. Did you get it from the Nutch project? And the omitNorms parameter is supposed to go in the field tag in schema.xml, and the content field in the example schema does not have omitNorms=true. Try to change

ampersands in index or query

2012-01-19 Thread Nicholas Fellows
I have some data in solr where the text string could potentially be Vic Bobs greatest hits how can i ensure that when a user query is made for Vic and Bobs greatest hits , a match is made? this also needs to work the other way round. i've not found any useful information about this scenario

Re: Just can't get Solritas to work, help!

2012-01-19 Thread Nicholas Fellows
Sincere apologies My Bad! N ... On 19 January 2012 13:37, remi tassing tassingr...@gmail.com wrote: Hey Nick, could you plz create a new thread? Remi On Thu, Jan 19, 2012 at 3:35 PM, Nicholas Fellows n...@djdownload.comwrote: Heya,  Question for you guys, Im trying to use the solr

testing MultiValued fields in analyis.jsp

2012-01-19 Thread Nicholas Fellows
Heya,  Question for you guys, Im trying to use the solr analysis.jsp toolto debug a solr query. I cant work out how to input sample data for the Field Value (Index)box when the data is multiValued. I was wondering if you could explain how to do this or point me to thedocumentation where this is

XPathEntityProcessor - excluding XML nodes based on comparison to another node from the same entity

2012-01-19 Thread Zajkowski, Radoslaw
Hi all, I am indexing some XML data which describes publications like brochures, manuals etc. The XML contains a section perfect for faceting that looks like this: search_keywords search_keywordM2M4-6655ENW/search_keyword search_keywordfolding cartons/search_keyword

RE: Highlighting more than 1 term

2012-01-19 Thread csscouter
Nitin (and any other interested parties here): Unfortunately, re-indexing the content did not resolve the problem and the symptom remains the same. Any additional advice is appreciated. Tim -- View this message in context:

slave index much larger than master index

2012-01-19 Thread anna headley
Hello solr-user list, I appear to have a number of issues right now on my slave server. 1. The most confusing one is that my slave index is currently 67 gigs, but my master index is only 27 gigs. Has anyone seen this before? Has anyone an idea of what could cause this? 2. I haven't been

Does it make sense to configure newSearcher and firstSearcher on a Solr Master instance?

2012-01-19 Thread Daniel Brügge
Hi, I am currently running multiple Solr instances and often write data to them. I also query them. Both works fine right now, because I don't have so many search requests. For querying I recognized that the firstSearcher and newSearcher static warming with one facet query really brings a

Indexing Pdf Portfolios

2012-01-19 Thread Lucas Simão
Hello , I am trying to index PDF files in Solr when the PDF file is simple everything is fine but when i use Portfolio PDF Portfolio (http://help.adobe.com/en_US/Acrobat/9.0/Standard/WSA2872EA8-9756-4a8c-9F20-8E93D59D91CE.html ) using tika it does not works. Someone know how to extract data

RE: How to accelerate your Solr-Lucene appication by 4x

2012-01-19 Thread Steven A Rowe
I want to retract my objection to commercial messages. I think Ted's position is more reasonable: on-topic commercial messages that are responsive to (and maybe even anticipatory of) users' needs will likely be welcomed by many subscribed here. Producing a policy statement that perfectly

Re: ampersands in index or query

2012-01-19 Thread Gora Mohanty
On Thu, Jan 19, 2012 at 7:48 PM, Nicholas Fellows n...@djdownload.com wrote: I have some data in solr where the text string could potentially be Vic Bobs greatest hits how can i ensure that when a user query is made for Vic and Bobs greatest hits , a match is made? this also needs to

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-19 Thread Gora Mohanty
On Thu, Jan 19, 2012 at 9:32 PM, Steven A Rowe sar...@syr.edu wrote: I want to retract my objection to commercial messages.  I think Ted's position is more reasonable: on-topic commercial messages that are responsive to (and maybe even anticipatory of) users' needs will likely be welcomed by

Re: How to import data from xml files to solr

2012-01-19 Thread Gora Mohanty
On Thu, Jan 19, 2012 at 5:56 PM, solr lakshmi2...@gmail.com wrote: How to import data from xml files to solr. is this is the right command    java -jar post.jar sample.xml [...] This should work, but please take a look at the format of the sample XML files: Solr expects a specific format for

Re: Trying to understand SOLR memory requirements

2012-01-19 Thread Dave
I'm also seeing the error when I try to start up the SOLR instance: SEVERE: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:344) at org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:352) at

Re: How to return the distance geo distance on solr 3.5 with bbox filtering

2012-01-19 Thread Maxim Veksler
It works. The query: * http://localhost:8983/solr/select?indent=truefq={!bbox}sfield=locpt=34.0415954,-118.298797d=1000.0sort=score%20ascfq=trafficRouteId:887q={!func}geodist()fl=*,scorerows=1 * works perfectly, doing all the filtering needed and returning the distance as score. Thank you very

Re: Trying to understand SOLR memory requirements

2012-01-19 Thread Robert Muir
I don't think the problem is FST, since it sorts offline in your case. More importantly, what are you trying to put into the FST? it appears you are indexing terms from your term dictionary, but your term dictionary is over 1GB, why is that? what do your terms look like? 1GB for 2,784,937

RE: How to accelerate your Solr-Lucene appication by 4x

2012-01-19 Thread Steven A Rowe
Jason, If I understand you correctly, you're referring to a thread http://search-lucene.com/m/iMCFOqzcmS1/%22Performance+Monitoring+SaaS+for+Solr%22/v=threaded in which you objected to a commercial tagline. At the time that thread was active, I didn't agree with you, though I didn't engage

Re: Trying to understand SOLR memory requirements

2012-01-19 Thread Dave
In my original post I included one of my terms: Brooklyn, New York, United States?{ |id|: |2620829|, |timezone|:|America/New_York|,|type|: |3|, |country|: { |id| : |229| }, |region|: { |id| : |3608| }, |city|: { |id|: |2616971|, |plainname|: |Brooklyn|, |name|: |Brooklyn, New York, United States|

Re: replication, disk space

2012-01-19 Thread Jonathan Rochkind
Thanks for the response. I am using Linux (RedHat). It sounds like it may possibly be related to that bug. But the thing is, the timestamped index directory is looking to me like it's the _current_ one, with the non-timestamped one being an old out of date one. So that does not seem to be

Re: How to return the distance geo distance on solr 3.5 with bbox filtering

2012-01-19 Thread Mikhail Khludnev
On Thu, Jan 19, 2012 at 8:29 PM, Maxim Veksler ma...@vekslers.org wrote: It works. The query: * http://localhost:8983/solr/select?indent=truefq={!bbox}sfield=locpt=34.0415954,-118.298797d=1000.0sort=score%20ascfq=trafficRouteId:887q={!func}geodist()fl=*,scorerows=1 * works perfectly,

Re: replication, disk space

2012-01-19 Thread Jonathan Rochkind
Hmm, I don't have a replication.properties file, I don't think. Oh wait, yes I do there it is! I guess the replication process makes this file? Okay I don't see an index directory in the replication.properties file at all though. Below is my complete replication.properties. So I'm

Re: replication, disk space

2012-01-19 Thread Jonathan Rochkind
On 1/18/2012 1:53 PM, Tomás Fernández Löbbe wrote: As far as I know, the replication is supposed to delete the old directory index. However, the initial question is why is this new index directory being created. Are you adding/updating documents in the slave? what about optimizing it? Are you

Re: replication, disk space

2012-01-19 Thread Jonathan Rochkind
Okay, I do have an index.properties file too, and THAT one does contain the name of an index directory. But it's got the name of the timestamped index directory! Not sure how that happened, could have been Solr trying to recover from running out of disk space in the middle of a replication?

Re: Trying to understand SOLR memory requirements

2012-01-19 Thread Robert Muir
I really don't think you should put a huge json document as a search term. Just make Brooklyn, New York, United States or whatever you intend the user to actually search on/type in as your search term. put the rest in different fields (e.g. stored-only, not even indexed if you dont need that) and

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-19 Thread Ted Dunning
Peter, My guess is that if you had said something along the lines of We have developed some SSD support software that makes SOLR work better. I would like to open a conversation here (link to external discussion) that would have been reasonably well received. One of the things that makes SPAM

RE: replication, disk space

2012-01-19 Thread Dyer, James
You can do all the steps to rename the timestamp dir back to index, but I don't think you don't have to. Solr will know on restart to use the timestamped directory so long as it is in the properties file (sorry, I must have told you to look at the wrong file...I'm working on old memories

Re: How to import data from xml files to solr

2012-01-19 Thread solr
Hi gora, thanks for your reply.am new to solr .I have been check solr tutorial.and noticed example xmlfiles exampledcos folder in solr distribution. 1.here indexing means importdata into solr? If i want to start new example insted in solrdistribution.How to proceed. Am bit confusing about solr

Re: Sorting results within the fields

2012-01-19 Thread Nitin Arora
Anybody have any suggestions or hints? ~Nitin -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-results-within-the-fields-tp3656049p3673371.html Sent from the Solr - User mailing list archive at Nabble.com.

Question about sorting by a field

2012-01-19 Thread federico.wachs
Hi all, I've been working with solr for a few months now and so far I only had a few issues trying to implement some functionality, but this has gone above my current solr skills, so any help or guidance is greatly appreciated. I have a multivalued field consited which contains destinations like:

restrict fuzzy search to longer words

2012-01-19 Thread Lance
HI, Could you please help me with a quick question - Is there a way to restrict lucene/solr fuzzy search to only analyze words that have more than 5 characters and to ignore words with less than that (i.e. less than 6 character words)? Thanks - Lance

RE: Question about sorting by a field

2012-01-19 Thread Michael Ryan
How about having a single-valued field named firstDestination that has the first destination in the list, and then your query could be something like 'destination:Buenos Aires firstDestination:Buenos Aires'. Docs that match both should have a higher score and thus will be listed first.

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-19 Thread Jason Rutherglen
2. they always *follow* on-topic discussion Not in the example given. 3. the line is blurry, e.g. nobody will object to including one's employer in a tagline. Product placement is not blurry. The incentive is to then answer someone else's user email, in order to post yet another spam'd

Re: Takes a while to see changes in data even after comit

2012-01-19 Thread abhayd
Hi, What Solr version? 4.0 How many docs? 700 What do you use as qutowarm count?700 If it's too high, it may take time. Do you use spellcheck and buildOnCommit?No we dont use this -- View this message in context:

using solr for time series data

2012-01-19 Thread Robert Stewart
I have a project where the client wants to store time series data (maybe in SOLR if it can work). We want to store daily prices over last 20 years (about 6000 values with associate dates), for up to 500,000 entities. This data currently exists in a SQL database. Access to SQL is too slow for

RE: Different mm for spellcheckquery

2012-01-19 Thread Dyer, James
I'm not sure there is a good way to this this currently. I think you'd just have to issue a second query with mm=100 to get additional spelling suggestions as maxCollationTries is designed to replicate the original query when trying collations for hits. It might be a worthy enhancement to

Re: Src code download url needed for SOLR 3.5

2012-01-19 Thread lboutros
Hello, you can get the source code from the svn repository too : http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_5/ Ludovic. - Jouve France. -- View this message in context:

3.5.0 troubles

2012-01-19 Thread Wayne W
HI, I'm trying to setup the latest version of Solr. Currently we're running 1.3 so we're a bit out of date! Having trouble setting up the tika/extractionhandler jars etc, but I think I'm nearly there. However I've got this stack trace, thats complaining about a required field missing. However as

Re: using solr for time series data

2012-01-19 Thread Ted Dunning
Take a look at openTSDB. You might want to use that as is, or steal some of the concepts. The major idea to snitch is the idea of using a single row of hte data base (document in Lucene or Solr) to hold many data points. Thus, you could consider having documents with the following fields: key:

Field Collapsing / Result Grouping

2012-01-19 Thread cmathur
Hi, I want to know if these are possible using the FieldCollapsing/ResultGrouping feature. 1) I want to group skus based on certain attribute. So i have an attribute against a sku that i will use for grouping. Suppose i limit the result set to 20 to display on the first page. Will i get a total

Re: Takes a while to see changes in data even after comit

2012-01-19 Thread Jan Høydahl
Hi, Try lowering your autowarm to, say, 25, and see if it helps. How often do you call commit? If you have too much warming so it takes longer time than time between commits, you're lost... You can check the stats admin page to see the autowarm time. -- Jan Høydahl, search solution architect

Re: 3.5.0 troubles

2012-01-19 Thread Jan Høydahl
Shouldn't it be literal.uid=foo, not ext.literal.uid ?? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 19. jan. 2012, at 23:08, Wayne W wrote: HI, I'm trying to setup the latest version of Solr. Currently we're running 1.3

Re: testing MultiValued fields in analyis.jsp

2012-01-19 Thread Erick Erickson
I'm missing some stuff here. The analysis page has nothing to do with actual indexing. All it does is take the input and run it through the defined chains and show what tokens come out the other end and why. There's really nothing to do whatsoever with multiValued, that's just orthogonal to what

Re: Does it make sense to configure newSearcher and firstSearcher on a Solr Master instance?

2012-01-19 Thread Erick Erickson
It's generally recommended that you do the indexing on the master and searches on the slaves. In that case, firstSearcher and newSearcher sections are irrelevant on the master and shouldn't be there. I don't understand why you would need 5 more machines, are you sharding? Best Erick On Thu, Jan

Re: ampersands in index or query

2012-01-19 Thread Erick Erickson
Another approach is to use the stopwords and an appropriate analyzer chain. Then both the and and would be removed from the indexing stream and the query process and it would just work.. Best Erick On Thu, Jan 19, 2012 at 8:09 AM, Gora Mohanty g...@mimirtech.com wrote: On Thu, Jan 19, 2012 at

Re: How to import data from xml files to solr

2012-01-19 Thread Erick Erickson
You cannot directly import arbitrary XML. You'd have to read it into a program (look at SolrJ), parse and add it to SolrInputDocuments and send to Solr. Alternately, you could read the XML and transform it to solr friendly XML and then index those files. A third possibility is to use the Data

HIbernate Search and SOLR Integration

2012-01-19 Thread Anderson vasconcelos
Hi. It's possible to integrate Hibernate Search with SOLR? I wanna use Hibernate Search in my entities and use SOLR to make the work of index and search. Hibernate Search call SOLR to find in index and than find the respective objects in database. Is that possible? Exists some configuration for

RE: Tika0.10 language identifier in Solr3.5.0

2012-01-19 Thread nibing
Hi, Jan Høydahl You are right. I am hoping to detect the language of a query, so that the serarching can be done according to the language detected. Since people often type a few words, which is too few to detect, then it is hard to do that. Let me describe a little bit about the solr

Re: Tika0.10 language identifier in Solr3.5.0

2012-01-19 Thread Ted Dunning
Normally this is done by putting a field on each document rather than separating the documents into separate corpora. Keeping them together makes the final search faster. At query time, you can add all of the language keys that you think are relevant based on your language id applied to the

Re: Src code download url needed for SOLR 3.5

2012-01-19 Thread mechravi25
Hi, Thanks a lot for your reply, will try to get the code from the repository you provided. Thanks, Ravi -- View this message in context: http://lucene.472066.n3.nabble.com/Src-code-download-url-needed-for-SOLR-3-5-tp3671810p3674513.html Sent from the Solr - User mailing list archive at

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-19 Thread Otis Gospodnetic
Hi Peter, Has anyone else tried adding SSDs as a cache to boost the performance of Solr clusters? Can you share your results? What do you mean by using SSD *as a cache*? A few years ago, Toke Eskildsen and his colleagues compared Lucene performance with traditional HDDs and SSDs and of

Re: How can a distributed Solr setup scale to TB-data, if URL limitations are 4000 for distributed shard search?

2012-01-19 Thread Otis Gospodnetic
Hi Daniel, - Original Message - From: Daniel Bruegge daniel.brue...@googlemail.com To: solr-user@lucene.apache.org; Otis Gospodnetic otis_gospodne...@yahoo.com Cc: Sent: Thursday, January 19, 2012 5:49 AM Subject: Re: How can a distributed Solr setup scale to TB-data, if URL

Re: Ngram autocompleter and term frequency boosting

2012-01-19 Thread Otis Gospodnetic
Cuong, If when you are indexing your AC suggestions you know Java Developer appears twice in the index, why not give it appropriate index-time boost?  Wouldn't that work for you? Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html

Re: Tika0.10 language identifier in Solr3.5.0

2012-01-19 Thread Otis Gospodnetic
Hi, I think guessing the language based purely on query string is OK *if* you are OK it not being very accurate and finding ways to work around that, say by giving users the options to switch to another language easily, allowing them to easily select a default language for them in the future,

Re: HIbernate Search and SOLR Integration

2012-01-19 Thread Otis Gospodnetic
Hi Anderson, Not sure if you saw http://wiki.apache.org/solr/DataImportHandler Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html - Original Message - From: Anderson vasconcelos anderson.v...@gmail.com To: solr-user

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-19 Thread Fuad Efendi
I agree that SSD boosts performance... In some rare not-real-life scenario: - super frequent commits That's it, nothing more except the fact that Lucene compile time including tests takes up to two minutes on MacBook with SSD, or forty-fifty minutes on Windows with HDD. Of course, with non-empty

Re: How to import data from xml files to solr

2012-01-19 Thread solr
Is indexing xml files and import data from xml is same or different in solr concept. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-import-data-from-xml-files-to-solr-tp3672193p3674641.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: 3.5.0 troubles

2012-01-19 Thread Wayne W
Hi Jan, In Solr 1.3 we used that format. I'll give it a go Thx On Friday, January 20, 2012, Jan Høydahl jan@cominvent.com wrote: Shouldn't it be literal.uid=foo, not ext.literal.uid ?? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training -

Re: How to import data from xml files to solr

2012-01-19 Thread Gora Mohanty
On Fri, Jan 20, 2012 at 11:00 AM, solr lakshmi2...@gmail.com wrote: Is indexing xml files and import data from xml is same or different in solr concept. They end up doing the same thing, which is getting the data into Solr. There are various ways of doing this, as Erick has pointed out, and

RE: Tika0.10 language identifier in Solr3.5.0

2012-01-19 Thread nibing
Hi, Ted Dunning, Thank you for your reply. I can understand your point on putting a language_s field and then keeping all the files together, which speed-up searching. But then there occurs a problem of using analyzer in indexing. I assume files encoded in different language should be

RE: Tika0.10 language identifier in Solr3.5.0

2012-01-19 Thread nibing
Hi, Ted Dunning, Thank you for your reply. I can understand your point on putting a language_s field and then keeping all the files together, which speed-up searching. But then there occurs a problem of using analyzer in indexing. I assume files encoded in different language should be

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-19 Thread Ted Dunning
Actually, for search applications there is a reasonable amount of evidence that holding the index in RAM is actually more cost effective than SSD's because the throughput is enough faster to make up for the price differential. There are several papers out of UMass that describe this trade-off,

Re: Ngram autocompleter and term frequency boosting

2012-01-19 Thread Andrew Harvey
With Solr 4.0 you could use relevance functions to give a query time boost if you don't have the information at index time. Alternatively you could do term facet based autocomplete which would mean you could sort by count rather than any other input. Andrew Sent on the run. On 20/01/2012,

Serving spell checked content

2012-01-19 Thread Bojan Miletic
Hi everyone. I'm having a bit of problem and was hoping you could help me. My Solr instance is getting lot of wrongly spelled data in its input and I was wondering is there a way to make Solr perform spell check on data before importing it or if that's not possible perform spell check on results