RE: Indexing / querying multiple data types

2010-02-09 Thread stefan.maric
Sven In my data-config.xml I have the following document entity name=name1 query=select id, atomID, name, description from v_1 / entity name=name2 query=select id, atomID, name, description from V_2 / /document In my schema.xml I have field

Unsubscribe from mailing list

2010-02-09 Thread Abin Mathew
Please unsubscribe me from Mailing list

Re: Dynamic fields with more than 100 fields inside

2010-02-09 Thread Xavier Schepler
Shalin Shekhar Mangar a écrit : On Mon, Feb 8, 2010 at 9:47 PM, Xavier Schepler xavier.schep...@sciences-po.fr wrote: Hey, I'm thinking about using dynamic fields. I need one or more user specific field in my schema, for example, concept_user_*, and I will have maybe more than 200 users

Posting pdf file and posting from remote

2010-02-09 Thread alendo
I understand that tika is able to index pdf content: its true? I tried to post a pdf from local and I've seen in the solr/admin schema browser another document, but when I search only the document id is available, the documents doesn't seem indexed. Do I need other products to index pdf content?

Re: Dynamic fields with more than 100 fields inside

2010-02-09 Thread Shalin Shekhar Mangar
On Tue, Feb 9, 2010 at 2:43 PM, Xavier Schepler xavier.schep...@sciences-po.fr wrote: Shalin Shekhar Mangar a écrit : On Mon, Feb 8, 2010 at 9:47 PM, Xavier Schepler xavier.schep...@sciences-po.fr wrote: Hey, I'm thinking about using dynamic fields. I need one or more user specific

Re: Dynamic fields with more than 100 fields inside

2010-02-09 Thread Xavier Schepler
Shalin Shekhar Mangar a écrit : On Tue, Feb 9, 2010 at 2:43 PM, Xavier Schepler xavier.schep...@sciences-po.fr wrote: Shalin Shekhar Mangar a écrit : On Mon, Feb 8, 2010 at 9:47 PM, Xavier Schepler xavier.schep...@sciences-po.fr wrote: Hey, I'm thinking about using

DIH: delta-import not working

2010-02-09 Thread Jorg Heymans
Hi, I am having problems getting the delta-import to work for my schema. Following what i have found in the list, jira and the wiki below configuration should just work but it doesn't. dataConfig dataSource name=ora driver=oracle.jdbc.OracleDriver url=jdbc:oracle:thin:@. user=

Re: unloading a solr core doesn't free any memory

2010-02-09 Thread Tim Terlegård
I don't use any garbage collection parameters. /Tim 2010/2/8 Simon Rosenthal simon_rosent...@yahoo.com: What Garbage Collection parameters is the JVM using ?   the memory will not always be freed immediately after an event like unloading a core or starting a new searcher. 2010/2/8 Tim

Re: Posting pdf file and posting from remote

2010-02-09 Thread alendo
Ok I'm going ahead (may be:). I tried another curl command to send the file from remote: http://mysolr:/solr/update/extract?literal.id=8514stream.file=files/attach-8514.pdfstream.contentType=application/pdf and the behaviour has been changed: now I get an error in solr log file: HTTP

Re: unloading a solr core doesn't free any memory

2010-02-09 Thread Tim Terlegård
If I unload the core and then click Perform GC in jconsole nothing happens. The 8 GB RAM is still used. If I load the core again and then run the query with the sort fields, then jconsole shows that the memory usage immediately drops to 1 GB and then rises to 8 GB again as it caches the stuff.

Re: TermInfosReader.get ArrayIndexOutOfBoundsException

2010-02-09 Thread Michael McCandless
Which version of Solr/Lucene are you using? Can you run Lucene's CheckIndex tool (java -ea:org.apache.lucene org.apache.lucene.index.CheckIndex /path/to/index) and then post the output? Have you altered any of IndexWriter's defaults (via solrconfig.xml)? Eg the termIndexInterval? Mike On Mon,

joining two field for query

2010-02-09 Thread Ranveer Kumar
Hi all, I need logic in solr to join two field in query; I indexed two field : id and body(text type). 5 rows are indexed: id=1 : text= nokia samsung id=2 : text= sony vaio nokia samsung id=3 : text= vaio nokia etc.. I am searching by q=id:1 returning result perfectly, returning

Re: DIH: delta-import not working

2010-02-09 Thread Noble Paul നോബിള്‍ नोब्ळ्
try this deltaImportQuery=select id, bytes from attachment where application = 'MYAPP' and id = '${dataimporter.delta.id}' be aware that the names are case sensitive . if the id comes as 'ID' this will not work On Tue, Feb 9, 2010 at 3:15 PM, Jorg Heymans jorg.heym...@gmail.com wrote: Hi,

Re: Call URL, simply parse the results using SolrJ

2010-02-09 Thread Noble Paul നോബിള്‍ नोब्ळ्
you can also try URL urlo = new URL(url);// ensure that the url has wt=javabin in that NamedListObject namedList = new JavaBinCodec().unmarshal(urlo.openConnection().getInputStream()); QueryResponse response = new QueryResponse(namedList, null); On Mon, Feb 8, 2010 at 11:49 PM, Jason Rutherglen

Re: joining two field for query

2010-02-09 Thread Ahmet Arslan
I am searching by nokia and resulting (listing) 1,2,3 field with short description. There is link on search list(like google), by clicking on link performing new search (opening doc from index), for this search I want to join two fields: id:1 + queryString (nokia samsung) to return

Replication and querying

2010-02-09 Thread Julian Hille
Hi, id like to know if its possible to have a solr Server with a schema and lets say 10 fields indexed. I know want to replicate this whole index to another solr server which has a slightly different schema. There are additional 6 fields these fields change the sort order for a product which

Re: joining two field for query (Solved)

2010-02-09 Thread Ranveer Kumar
Hi Ahmet, Thank you very much.. my problem solved.. with regards On Tue, Feb 9, 2010 at 5:38 PM, Ahmet Arslan iori...@yahoo.com wrote: I am searching by nokia and resulting (listing) 1,2,3 field with short description. There is link on search list(like google), by clicking on link

Autosuggest and highlighting

2010-02-09 Thread gwk
Hi, I'm trying to improve the search box on our website by adding an autosuggest field. The dataset is a set of properties in the world (mostly europe) and the searchbox is intended to be filled with a country-, region- or city name. To do this I've created a separate, simple core with one

Re: Solr usage with Auctions/Classifieds?

2010-02-09 Thread Jan Høydahl / Cominvent
With the new sort by function in 1.5 (http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function), will it now be possible to include the ExternalFileField value in the sort formula? If so, we could sort on last bid price or last bid time without updating the document itself. However, to

Distributed search and haproxy and connection build up

2010-02-09 Thread Ian Connor
I have been using distributed search with haproxy but noticed that I am suffering a little from tcp connections building up waiting for the OS level closing/time out: netstat -a ... tcp6 1 0 10.0.16.170%34654:53789 10.0.16.181%363574:8893 CLOSE_WAIT tcp6 1 0

Re: Autosuggest and highlighting

2010-02-09 Thread gwk
On 2/9/2010 2:57 PM, Ahmet Arslan wrote: I'm trying to improve the search box on our website by adding an autosuggest field. The dataset is a set of properties in the world (mostly europe) and the searchbox is intended to be filled with a country-, region- or city name. To do this I've created a

Re: Faceting

2010-02-09 Thread Jan Høydahl / Cominvent
NOTE: Please start a new email thread for a new topic (See http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking) Your strategy could work. You might want to look into dedicated entity extraction frameworks like http://opennlp.sourceforge.net/

Re: DIH: delta-import not working

2010-02-09 Thread Jorg Heymans
indeed that made it work. Looking back at the documentation, it's all there but one needs to read every single line with care :-) 2010/2/9 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com try this deltaImportQuery=select id, bytes from attachment where application = 'MYAPP' and id =

Re: Is it posible to exclude results from other languages?

2010-02-09 Thread Jan Høydahl / Cominvent
Much more efficient to tag documents with language at index time. Look for language identification tools such as http://www.sematext.com/products/language-identifier/index.html or http://ngramj.sourceforge.net/ or

Re: joining two field for query

2010-02-09 Thread Jan Høydahl / Cominvent
You may also want to play with other highlighting parameters to select how much text to do highlighting on, how many fragments etc. See http://wiki.apache.org/solr/HighlightingParameters -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 9. feb. 2010, at 13.08, Ahmet Arslan

Re: Replication and querying

2010-02-09 Thread Jan Høydahl / Cominvent
Hi, Index replication in Solr makes an exact copy of the original index. Is it not possible to add the 6 extra fields to both instances? An alternative to replication is to feed two independent Solr instances - full control :) Please elaborate on your specific use case if this is not useful

Re: unloading a solr core doesn't free any memory

2010-02-09 Thread Jason Rutherglen
Tim, The GC just automagically works right? :) There's been issues around thread local in Lucene. The main code for core management is CoreContainer, which I believe is fairly easy to digest. If there's an issue you may find it there. Jason 2010/2/9 Tim Terlegård tim.terleg...@gmail.com:

Question on Tokenizing email address

2010-02-09 Thread Abhishek Srivastava
Hello Everyone, I have a field in my solr schema which stores emails. The way I want the emails to be tokenized is like this. if the email address is abc@alpha-xyz.com User should be able to search on 1. abc@alpha-xyz.com (whole address) 2. abc 3. def 4. alpha-xyz Which tokenizer

Re: TermInfosReader.get ArrayIndexOutOfBoundsException

2010-02-09 Thread Tom Burton-West
Thanks Lance and Michael, We are running Solr 1.3.0.2009.09.03.11.14.39 (Complete version info from Solr admin panel appended below) I tried running CheckIndex (with the -ea: switch ) on one of the shards. CheckIndex also produced an ArrayIndexOutOfBoundsException on the larger segment

RE: HTTP caching and distributed search

2010-02-09 Thread Charlie Jackson
I tried your suggestion, Hoss, but committing to the new coordinator core doesn't change the indexVersion and therefore the ETag value isn't changed. I opened a new JIRA issue for this http://issues.apache.org/jira/browse/SOLR-1765 Thanks, Charlie -Original Message- From: Chris

Re: TermInfosReader.get ArrayIndexOutOfBoundsException

2010-02-09 Thread Michael McCandless
Yes, the term count reported by CheckIndex is the total number of unique terms. It indeed looks like you are exceeding the unique term count limit -- 16777214 * 128 (= the default term index interval) is 2147483392 which is mighty close to max/min 32 bit int value. This makes sense, because

Re: TermInfosReader.get ArrayIndexOutOfBoundsException

2010-02-09 Thread Michael McCandless
I opened a Lucene issue w/ patch to try: https://issues.apache.org/jira/browse/LUCENE-2257 Tom let me know if you're able to test this... thanks! Mike On Tue, Feb 9, 2010 at 2:09 PM, Michael McCandless luc...@mikemccandless.com wrote: Yes, the term count reported by CheckIndex is the total

Re: TermInfosReader.get ArrayIndexOutOfBoundsException

2010-02-09 Thread Tom Burton-West
Thanks Michael, I'm not sure I understand. CheckIndex reported a negative number: -16777214. But in any case we can certainly try running CheckIndex from a patched lucene We could also run a patched lucene on our dev server. Tom Yes, the term count reported by CheckIndex is the total

Re: TermInfosReader.get ArrayIndexOutOfBoundsException

2010-02-09 Thread Michael McCandless
I attached a patch to the issue that may fix it. Maybe start by running CheckIndex first? Mike On Tue, Feb 9, 2010 at 2:56 PM, Tom Burton-West tburtonw...@gmail.com wrote: Thanks Michael, I'm not sure I understand.  CheckIndex reported a negative number: -16777214. But in any case we can

Re: TermInfosReader.get ArrayIndexOutOfBoundsException

2010-02-09 Thread Michael McCandless
On Tue, Feb 9, 2010 at 2:56 PM, Tom Burton-West tburtonw...@gmail.com wrote: I'm not sure I understand. CheckIndex reported a negative number: -16777214. Right, we are overflowing the positive ints, which wraps around to the smallest int (-2.1 billion), and then dividing by 128 = ~ -1677214.

Solr/Drupal Integration - Query Question

2010-02-09 Thread jaybytez
I know this is not Drupal, but thought this question maybe more around the Solr query. For instance, I pulled down LucidImaginations Solr install, just like the apache solr install and ran the example solr and loaded the documents from the exampledocs. I can go to:

Re: Question on Tokenizing email address

2010-02-09 Thread Jan Høydahl / Cominvent
Hi, To match 1, 2, 3, 4 below you could use a fieldtype based on TextField, with just a simple WordDelimiterFactory. However, this would also match abc-def, def.alpha, xyz-com and a...@def, because all punctuation is treated the same. To avoid this, you could do some custom handling of -, .

How to add SpellCheckResponse to Solritas?

2010-02-09 Thread Jan Høydahl / Cominvent
Hi, I'm using the /itas requestHandler, and would like to add spell-check suggestions to the output. I'm having spell-check configured and working in the XML response writer, but nothing is output in Velocity. Debugging the JSON $response object, I cannot find any representation of spellcheck

Bigram term vectors and weights possible with Solr?

2010-02-09 Thread Mike Hughes
Hello, One of the commercial search platforms I work with has the concept of 'document vectors', which are 1-gram and 2-gram phrases and their associated tf/idf weights on a 0-1 scale, i.e. [banana pie, 0.99] means banana pie is very relevant for this document. During the ingest/indexing process

Re: Bigram term vectors and weights possible with Solr?

2010-02-09 Thread Ahmet Arslan
I've been looking at the Solr TermVectorComponent (http://wiki.apache.org/solr/TermVectorComponent) and it seems to have something similar to this, but it looks to me like this is a component that is processed at query time (?) and is limited to 1-gram terms. If you use filter

after flush: fdx size mismatch on query durring writes

2010-02-09 Thread Acadaca
We are using Solr 1.4 in a multi-core setup with replication. Whenever we write to the master we get the following exception: java.lang.RuntimeException: after flush: fdx size mismatch: 1285 docs vs 0 length in bytes of _gqg.fdx file exists?=false at

Re: Solr usage with Auctions/Classifieds?

2010-02-09 Thread Lance Norskog
The class was added in 2007 and hasn't changed. I don't know if anyone uses it. Presumably sort-by-function will use it. On Tue, Feb 9, 2010 at 5:59 AM, Jan Høydahl / Cominvent jan@cominvent.com wrote: With the new sort by function in 1.5

Re: Bigram term vectors and weights possible with Solr?

2010-02-09 Thread Mike Hughes
Thank you Ahmet, this is exactly what I was looking for. Looks like the shingle filter can produce 3+-gram terms as well, that's great. I'm going to try this with both western and CJK language tokenizers and see how it turns out. On Tue, Feb 9, 2010 at 5:07 PM, Ahmet Arslan iori...@yahoo.com

Copying dynamic fields into default text field messing up fieldNorm?

2010-02-09 Thread Yu-Shan Fung
Hi All, I'm trying to create an index of documents, where for each document, I am trying to associate with it a set of related keywords, each with individual boost values that I compute externally. eg: Document Title: Democrats related keywords: liberal: 4.0 politics: 1.5 obama:

Re: Indexing / querying multiple data types

2010-02-09 Thread Lance Norskog
A couple of minor problems: The qt parameter (Que Tee) selects the parser for the q (Q for query) parameter. I think you mean 'qf': http://wiki.apache.org/solr/DisMaxRequestHandler#qf_.28Query_Fields.29 Another problems with atomID, atomId, atomid: Solr field names are case-sensitive. I don't

Re: Posting pdf file and posting from remote

2010-02-09 Thread Lance Norskog
stream.file= means read a local file from the server that solr runs on. It has to be a complete path that works from that server. To load the file over HTTP you have to use @filename to have curl open it. This path has to work from the program you run curl on, and relative paths work. Also, tika

Re: Distributed search and haproxy and connection build up

2010-02-09 Thread Lance Norskog
This goes through the Apache Commons HTTP client library: http://hc.apache.org/httpclient-3.x/ We used 'balance' at another project and did not have any problems. On Tue, Feb 9, 2010 at 5:54 AM, Ian Connor ian.con...@gmail.com wrote: I have been using distributed search with haproxy but noticed

analysing wild carded terms

2010-02-09 Thread Joe Calderon
hello *, quick question, what would i have to change in the query parser to allow wildcarded terms to go through text analysis?

Re: Autosuggest and highlighting

2010-02-09 Thread Lance Norskog
To select the whole string, I think you want hl.fragmenter=regex and to create a regex pattern for your entire strings: http://www.lucidimagination.com/search/document/CDRG_ch07_7.9?q=highlighter+multi-valued This will let you select the entire string field. But I don't know how to avoid the

Re: Is it posible to exclude results from other languages?

2010-02-09 Thread Lance Norskog
That's what I was going to look up :) The nutch thing works reasonably well. It comes with a training database from various languages. It had some UTF-8 problems in the files. The trick here is to come up with a balanced volume of text for all languages so that one language's patterns do not

Re: Solr/Drupal Integration - Query Question

2010-02-09 Thread Lance Norskog
The admin/form.jsp is supposed to prepopulate fl= with '*,score' which means bring back all fields and the calculated relevance score. This is the Drupal search, decoded. I changed the %2B to + signs for readability. Have a look at the filter query fq= and the facet date range. Also, in Solr 1.4

Re: after flush: fdx size mismatch on query durring writes

2010-02-09 Thread Lance Norskog
We need more information. How big is the index in disk space? How many documents? How many fields? What's the schema? What OS? What Java version? Do you run this on a local hard disk or is it over an NFS mount? Does this software commit before shutting down? If you run with asserts on do you

Re: Question on Tokenizing email address

2010-02-09 Thread abhishes
Thank you! it works very well. I think that the field type suggested by you will index words like DOT, AT, com also In order to prevent these words from getting indexed, I have changed the field type to fieldType name=email class=solr.TextField positionIncrementGap=100 analyzer

Re: Is it posible to exclude results from other languages?

2010-02-09 Thread Shalin Shekhar Mangar
On Wed, Feb 10, 2010 at 10:09 AM, Lance Norskog goks...@gmail.com wrote: Thanks for the pointer to ngramj (LGPL license), which then leads to another contender, http://tcatng.sourceforge.net/ (BSD license). The latter would make a great DIH Transformer that could go into contrib/ (hint