Indexing Failed rolled back

2011-01-25 Thread Dinesh
i did some research in schema DIH config file and i created my own DIH, i'm getting this error when i run response − lst name=responseHeader int name=status0/int int name=QTime0/int /lst − lst name=initArgs − lst name=defaults str name=configtry.xml/str /lst /lst str name=commandfull-import/str

Re: DIH serialize

2011-01-25 Thread Stefan Matheis
Rich, i played around for a few minutes with Script-Transformers, but i have not enough knowledge to get anything done right know :/ My Idea was: looping over the given row, which should be a Java HashMap or something like that? and do sth like this (pseudo-code): var row_data = [] for( var key

Re: synonyms file, and example cases

2011-01-25 Thread Stefan Matheis
Cam, the examples with the provided inline-documentation should help you, no? http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory The Backslash \ in that context looks like an Escaping-Character, to avoid the = to be interpreted as assign-command Regards Stefan

Performance optimization of Proximity/Wildcard searches

2011-01-25 Thread Salman Akram
Hi, I am facing performance issues in three types of queries (and their combination). Some of the queries take more than 2-3 mins. Index size is around 150GB. - Wildcard - Proximity - Phrases (with common words) I know CommonGrams and Stop words are a good way to resolve such issues

Re: please help Problem with dataImportHandler

2011-01-25 Thread Stefan Matheis
Caused by: org.xml.sax.SAXParseException: Element type field must be followed by either attribute specifications, or /. Sounds like invalid XML in your .. dataimport-config? On Tue, Jan 25, 2011 at 5:41 AM, Dinesh mdineshkuma...@karunya.edu.inwrote: http://pastebin.com/tjCs5dHm this is the

Re: please help Problem with dataImportHandler

2011-01-25 Thread Dinesh
ya after correcting it also it is throwing an exception - DINESHKUMAR . M I am neither especially clever nor especially gifted. I am only very, very curious. -- View this message in context:

Re: Getting started with writing parser

2011-01-25 Thread Gora Mohanty
On Tue, Jan 25, 2011 at 10:05 AM, Dinesh mdineshkuma...@karunya.edu.in wrote: http://pastebin.com/CkxrEh6h this is my sample log [...] And, which portions of the log text do you want to preserve? Does it go into Solr as a single error message, or do you want to separate out parts of it.

Re: Getting started with writing parser

2011-01-25 Thread Dinesh
i want to take the month, time, DHCPMESSAGE, from_mac, gateway_ip, net_ADDR - DINESHKUMAR . M I am neither especially clever nor especially gifted. I am only very, very curious. -- View this message in context:

Re: please help Problem with dataImportHandler

2011-01-25 Thread Dinesh
http://lucene.472066.n3.nabble.com/Getting-started-with-writing-parser-tp2278092p2327738.html this thread explains my problem - DINESHKUMAR . M I am neither especially clever nor especially gifted. I am only very, very curious. -- View this message in context:

Re: Getting started with writing parser

2011-01-25 Thread Gora Mohanty
On Tue, Jan 25, 2011 at 11:44 AM, Dinesh mdineshkuma...@karunya.edu.in wrote: i don't even know whether the regex expression that i'm using for my log is correct or no.. If it is the same try.xml that you posted earlier, it is very likely not going to work. You seem to have just cut and pasted

Re: Getting started with writing parser

2011-01-25 Thread Dinesh
no i actually changed the directory to mine where i stored the log files.. it is /home/exam/apa..solr/example/exampledocs i specified it in a solr schema.. i created an DataImportHandler for that in try.xml.. then in that i changed that file name to sample.txt that new try.xml is

Extracting contents of zipped files with Tika and Solr 1.4.1

2011-01-25 Thread Gary Taylor
Hi, I posted a question in November last year about indexing content from multiple binary files into a single Solr document and Jayendra responded with a simple solution to zip them up and send that single file to Solr. I understand that the Tika 0.4 JARs supplied with Solr 1.4.1 don't

DIH From various File system locations

2011-01-25 Thread pankaj bhatt
Hi All, I need to index the documents presents in my file system at various locations (e.g. C:\docs , d:\docs ). Is there any way through which i can specify this in my DIH Configuration. Here is my configuration:- document entity name=sd

Re: Performance optimization of Proximity/Wildcard searches

2011-01-25 Thread Toke Eskildsen
On Tue, 2011-01-25 at 10:20 +0100, Salman Akram wrote: Cache warming is a good option too but the index get updated every hour so not sure how much would that help. What is the time difference between queries with a warmed index and a cold one? If the warmed index performs satisfactory, then

Recommendation on RAM-/Cache configuration

2011-01-25 Thread Martin Grotzke
Hi, recently we're experiencing OOMEs (GC overhead limit exceeded) in our searches. Therefore I want to get some clarification on heap and cache configuration. This is the situation: - Solr 1.4.1 running on tomcat 6, Sun JVM 1.6.0_13 64bit - JVM Heap Params: -Xmx8G -XX:MaxPermSize=256m

Re: Specifying an AnalyzerFactory in the schema

2011-01-25 Thread Renaud Delbru
Hi Chris, On 24/01/11 21:18, Chris Hostetter wrote: : I notice that in the schema, it is only possible to specify a Analyzer class, : but not a Factory class as for the other elements (Tokenizer, Fitler, etc.). : This limits the use of this feature, as it is impossible to specify parameters :

Use terracotta bigmemory for solr-caches

2011-01-25 Thread Martin Grotzke
Hi, as the biggest parts of our jvm heap are used by solr caches I asked myself if it wouldn't make sense to run solr caches backed by terracotta's bigmemory (http://www.terracotta.org/bigmemory). The goal is to reduce the time needed for full / stop-the-world GC cycles, as with our 8GB heap the

Re: Performance optimization of Proximity/Wildcard searches

2011-01-25 Thread Salman Akram
By warmed index you only mean warming the SOLR cache or OS cache? As I said our index is updated every hour so I am not sure how much SOLR cache would be helpful but OS cache should still be helpful, right? I haven't compared the results with a proper script but from manual testing here are some

Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene

2011-01-25 Thread Markus Jelsma
Hi, Are you sure you need CMS incremental mode? It's only adviced when running on a machine with one or two processors. If you have more you should consider disabling the incremental flags. Cheers, On Monday 24 January 2011 19:32:38 Simon Wistow wrote: We have two slaves replicating off one

Re: Weird behaviour with phrase queries

2011-01-25 Thread Erick Erickson
Frankly, this puzzles me. It *looks* like it should be OK. One warning, the analysis page sometimes is a bit misleading, so beware of that. But the output of your queries make it look like the query is parsing as you expect, which leaves the question of whether your index contains what you think

Re: Recommendation on RAM-/Cache configuration

2011-01-25 Thread Markus Jelsma
On Tuesday 25 January 2011 11:54:55 Martin Grotzke wrote: Hi, recently we're experiencing OOMEs (GC overhead limit exceeded) in our searches. Therefore I want to get some clarification on heap and cache configuration. This is the situation: - Solr 1.4.1 running on tomcat 6, Sun JVM

Re: Adding weightage to the facets count

2011-01-25 Thread Johannes Goll
Hi Siva, try using the Solr Stats Component http://wiki.apache.org/solr/StatsComponent similar to select/?q=*:*stats=truestats.field={your-weight-field}stats.facet={your-facet-field} and get the sum field from the response. You may need to resort the weighted facet counts to get a descending

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-25 Thread johnnyisrael
Hi Eric, You are right, there is a copy field to EdgeNgram, I tried the configuration but it not working as expected. Configuration I tried: fieldType name=”query” class=”solr.TextField” positionIncrementGap=”100″ termVectors=”true” analyzer

Re: DIH From various File system locations

2011-01-25 Thread Estrada Groups
I would just use Nutch and specify the -solr param on the command line. That will add the extracted content your instance of solr. Adam Sent from my iPhone On Jan 25, 2011, at 5:29 AM, pankaj bhatt panbh...@gmail.com wrote: Hi All, I need to index the documents presents in my file

Re: Recommendation on RAM-/Cache configuration

2011-01-25 Thread Martin Grotzke
On Tue, Jan 25, 2011 at 2:06 PM, Markus Jelsma markus.jel...@openindex.iowrote: On Tuesday 25 January 2011 11:54:55 Martin Grotzke wrote: Hi, recently we're experiencing OOMEs (GC overhead limit exceeded) in our searches. Therefore I want to get some clarification on heap and cache

Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-01-25 Thread Erlend Garåsen
On 25.01.11 11.30, Erlend Garåsen wrote: Tika version 0.8 is not included in the latest release/trunk from SVN. Ouch, I wrote not instead of now. Sorry, I replied in a hurry. And to clarify, by content I mean the main content of a Word file. Title and other kinds of metadata are

Re: Getting started with writing parser

2011-01-25 Thread Gora Mohanty
On Tue, Jan 25, 2011 at 3:46 PM, Dinesh mdineshkuma...@karunya.edu.in wrote: no i actually changed the directory to mine where i stored the log files.. it is /home/exam/apa..solr/example/exampledocs i specified it in a solr schema.. i created an DataImportHandler for that in try.xml.. then

Re: Use terracotta bigmemory for solr-caches

2011-01-25 Thread Em
Hi Martin, are you sure that your GC is well tuned? A request that needs more than a minute isn't the standard, even when I consider all the other postings about response-performance... Regards -- View this message in context:

Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-01-25 Thread Gary Taylor
Thanks Erlend. Not used SVN before, but have managed to download and build latest trunk code. Now I'm getting an error when trying to access the admin page (via Jetty) because I specify HTMLStripStandardTokenizerFactory in my schema.xml, but this appears to be no-longer supplied as part of

List of indexed or stored fields

2011-01-25 Thread kenf_nc
I use a lot of dynamic fields, so looking at my schema isn't a good way to see all the field names that may be indexed across all documents. Is there a way to query solr for that information? All field names that are indexed, or stored? Possibly a count by field name? Is there any other metadata

Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-01-25 Thread Gary Taylor
OK, got past the schema.xml problem, but now I'm back to square one. I can index the contents of binary files (Word, PDF etc...), as well as text files, but it won't index the content of files inside a zip. As an example, I have two txt files - doc1.txt and doc2.txt. If I index either of

Re: List of indexed or stored fields

2011-01-25 Thread Juan Grande
You can query all the indexed or stored fields (including dynamic fields) using the LukeRequestHandler: http://localhost:8983/solr/example/admin/luke See also: http://wiki.apache.org/solr/LukeRequestHandler Regards, * **Juan G. Grande* -- Solr Consultant @ http://www.plugtree.com -- Blog @

Re: DIH From various File system locations

2011-01-25 Thread pankaj bhatt
Thanks Adam, It seems like Nutch use to solve most of my concerns. i would be great if you can have share resources for Nutch with us. / Pankaj Bhatt. On Tue, Jan 25, 2011 at 7:21 PM, Estrada Groups estrada.adam.gro...@gmail.com wrote: I would just use Nutch and specify the -solr param on the

Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-01-25 Thread Jayendra Patil
Hi Gary, The latest Solr Trunk was able to extract and index the contents of the zip file using the ExtractingRequestHandler. The snapshot of Trunk we worked upon had the Tika 0.8 snapshot jars and worked pretty well. Tested again with sample url and works fine - curl

How to Configure Solr to pick my lucene custom filter

2011-01-25 Thread Valiveti
Hi , I have written a lucene custom filter. I could not figure out on how to configure Solr to pick this custom filter for search. How to configure Solr to pick my custom filter? Will the Solr standard search handler pick this custom filter? Thanks, Valiveti -- View this message in context:

in-index representaton of tokens

2011-01-25 Thread Dennis Gearon
So, the index is a list of tokens per column, right? There's a table per column that lists the analyzed tokens? And the tokens per column are represented as what, system integers? 32/64 bit unsigned ints? Dennis Gearon Signature Warning It is always a good idea to learn

Re: in-index representaton of tokens

2011-01-25 Thread Jonathan Rochkind
Why does it matter? You can't really get at them unless you store them. I don't know what table per column means, there's nothing in Solr architecture called a table or a column. Although by column you probably mean more or less Solr field. There is nothing like a table in Solr. Solr is

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-25 Thread Erick Erickson
Let's back up here because now I'm not clear what you actually want. EdgeNGrams are a way of matching substrings, which is what's happening here. Of course searching apple against any of the three examples, just as searching for apple without grams would match, that's the expected behavior. So,

Re: Highlighting with/without Term Vectors

2011-01-25 Thread Salman Akram
Anyone? On Tue, Jan 25, 2011 at 12:57 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: Just to add one thing, in case it makes a difference. Max document size on which highlighting needs to be done is few hundred kb's (in file system). In index its compressed so should be much

Re: How to Configure Solr to pick my lucene custom filter

2011-01-25 Thread Erick Erickson
Presumably your custom filter is in a jar file. Drop that jar file in solr_home/lib and refer it from your schema.xml file by its full name (e.g. com.yourcompany.filter.yourcustomfilter) just like the other filters and it should work fine. You can also put your jar anywhere you'd like and alter

Re: List of indexed or stored fields

2011-01-25 Thread kenf_nc
That's exactly what I wanted, thanks. Any idea what long name=version1294513299077/long refers to under the index section? I have 2 cores on one Tomcat instance, and 1 on a second instance (different server) and all 3 have different numbers for version, so I don't think it's the version of

Re: List of indexed or stored fields

2011-01-25 Thread Markus Jelsma
The index version. Can be used in replication to determine whether to replicate or not. On Tuesday 25 January 2011 20:30:21 kenf_nc wrote: refers to under the index section? I have 2 cores on one Tomcat instance, and 1 on a second instance (different server) and all 3 have different numbers

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-25 Thread johnnyisrael
Hi Eric, What I want here is, lets say I have 3 documents like [pineapple vers apple, milk with apple, apple milk shake ] and If i search for apple, it should return only apple milk shake because that term alone starts with the letter apple which I typed in. It should not bring others and if

Re: DIH From various File system locations

2011-01-25 Thread Adam Estrada
There are a few tutorials out there. 1. http://wiki.apache.org/nutch/RunningNutchAndSolr (not the most practical) 2. http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ (similar to 1.) 3. Build the latest from branch http://svn.apache.org/repos/asf/nutch/branches/branch-1.3/ and read this

Re: DIH From various File system locations

2011-01-25 Thread Adam Estrada
I take that back...Use am currently using version 1.2 and make sure that the latest versions of Tika and PDFBox is in the contrib folder. 1.3 is structured a bit differently and it doesn't look like there is a contrib directory. Maybe one of the Nutch contributors can comment on this? Adam On

CFP - Berlin Buzzwords 2011 - Search, Score, Scale

2011-01-25 Thread Isabel Drost
This is to announce the Berlin Buzzwords 2011. The second edition of the successful conference on scalable and open search, data processing and data storage in Germany, taking place in Berlin. Call for Presentations Berlin Buzzwords

Re: How to Configure Solr to pick my lucene custom filter

2011-01-25 Thread Valiveti
Hi Eric, Thanks for the reply. I Did see some entries in the solrconfig.xml for adding custom reposneHandlers, queryParsers and queryResponseWriters. Bit could not find the one for adding the custom filter. Could you point to the exact location or syntax to be used. Thanks, Valiveti --

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-25 Thread Jonathan Rochkind
I haven't figured out any way to achieve that AT ALL without making a seperate Solr index just to serve autosuggest queries. At least when you want to auto-suggest on a multi-value field. Someone posted a crazy tricky way to do it with a single-valued field a while ago. If you can/are willing

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-25 Thread Markus Jelsma
Then you don't need NGrams at all. A wildcard will suffice or you can use the TermsComponent. If these strings are indexed as single tokens (KeywordTokenizer with LowercaseFilter) you can simply do field:app* to retrieve the apple milk shake. You can also use the string field type but then you

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-25 Thread Markus Jelsma
Oh, i should perhaps mention that EdgeNGrams will yield results a lot quicker than using wildcards at the cost of a larger index. You should, of course, use EdgeNGrams if you worry about performance and have a huge index and a number of queries per second. Then you don't need NGrams at all. A

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-25 Thread mesenthil
The index contains around 1.5 million documents. As this is used for autosuggest feature, performance is an important factor. So it looks like, using edgeNgram it is difficult to achieve the the following Result should return only those terms where search letter is matching with the first

Specifying optional terms with standard (lucene) request handler?

2011-01-25 Thread Daniel Pötzinger
Hi I am searching for a way to specify optional terms in a query ( that dont need to match (But if they match should influence the scoring) ) Using the dismax parser a query like this: str name=mm2/str str name=debugQueryon/str str name=q+lorem ipsum dolor amet/str str name=qfcontent/str str

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-25 Thread Jonathan Rochkind
Ah, sorry, I got confused about your requirements, if you just want to match at the beginning of the field, it may be more possible. Using edgegrams or wildcard. If you have a single-valued field. Do you have a single-valued or a multi-valued field? That is, does each document have just one

Re: Specifying optional terms with standard (lucene) request handler?

2011-01-25 Thread Jonathan Rochkind
With the 'lucene' query parser? include q.op=OR and then put a + (mandatory) in front of every term in the 'q' that is NOT optional, the rest will be optional. I think that will do what want. Jonathan On 1/25/2011 5:07 PM, Daniel Pötzinger wrote: Hi I am searching for a way to specify

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-25 Thread mesenthil
Right now our configuration says multivalues=true. But that need not be true in our case. Will make it false and try and update this thread with more details.. -- View this message in context: http://lucene.472066.n3.nabble.com/EdgeNgram-Auto-suggest-doubles-ignore-tp2321919p2334627.html Sent

Re: Solr set up issues with Magento

2011-01-25 Thread Sandhya Padala
Thank you Markus. I have added few more fields to schema.xml. Now looks like the products are getting indexed. But no search results. In Magento if I configure to use SOlr as the search engine. Search is not returning any results. If I change the search engine to use Magento's inbuilt MYSQL ,

Best way to build a solr-based m2 project

2011-01-25 Thread Paul Libbrecht
Hello list, Apologies if this was already asked, I haven't found the answer in the archive. I've been out of this list for quite some time now, hence. I am looking at a good way to package a project based on maven2 that would create me a solr-based webapp. I would expect such projects as the

Re: in-index representaton of tokens

2011-01-25 Thread Dennis Gearon
I am saying there is a list of tokens that have been parsed (a table of them) for each column? Or one for the whole index? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes,

Re: How to Configure Solr to pick my lucene custom filter

2011-01-25 Thread Erick Erickson
First, let's be sure we're talking about the same thing. My response was for adding a filter to your analysis chain for a field in Schema.xml. Are you talking about a different sort of filter? Best Erick On Tue, Jan 25, 2011 at 4:09 PM, Valiveti narasimha.valiv...@gmail.comwrote: Hi Eric,

Re: in-index representaton of tokens

2011-01-25 Thread Markus Jelsma
This should shed some light on the matter http://lucene.apache.org/java/2_9_0/fileformats.html I am saying there is a list of tokens that have been parsed (a table of them) for each column? Or one for the whole index? Dennis Gearon Signature Warning It is always a

RE: DIH serialize

2011-01-25 Thread Papp Richard
Dear Stefan, thank you for your help! Well, I wrote a small script, even if not json, but works: script![CDATA[ function my_serialize(row) { var st = ; st = row.get('stt_id') + || + row.get('stt_name') + || + row.get('stt_date_from') + ||

RE: in-index representaton of tokens

2011-01-25 Thread Jonathan Rochkind
There aren't any tables involved. There's basically one list (per field) of unique tokens for the entire index, and also, a list for each token of which documents contain that token. Which is efficiently encoded, but I don't know the details of that encoding, maybe someone who does can tell

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-25 Thread Erick Erickson
OK, try this. Use some analysis chain for your field like: analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer This can be a multiValued field, BTW. now use the TermsComponent to fetch your data. See:

Re: Solr set up issues with Magento

2011-01-25 Thread Erick Erickson
There's almost no information to go on here. Please review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Tue, Jan 25, 2011 at 6:13 PM, Sandhya Padala geend...@gmail.com wrote: Thank you Markus. I have added few more fields to schema.xml. Now looks like the products are getting

Specifying optional terms with standard (lucene) request handler?

2011-01-25 Thread Daniel Pötzinger
Hello I am searching for a way to specify optional terms in a query ( that dont need to match (But if they match should influence the scoring) ) Using the dismax parser a query like this: str name=mm2/str str name=debugQueryon/str str name=q+lorem ipsum dolor amet/str str name=qfcontent/str

DIH clean=false

2011-01-25 Thread cyang2010
I am not sure if i really understand what that mean by clean=false. In my understanding, for full-import with default clean=true, it will blow off all document of the existing index. Then full import data from a table into a index. Is that right? Then for clean=false, my understanding is that