RE: Unexcpected RuntimeException when indexing with Solr 4.0 Beta

2012-08-29 Thread Alexander Cougarman
t. Or use a tool such as: http://download.cnet.com/Docx-to-Doc-Converter/3000-2079_4-75206386.html -- Jack Krupansky -Original Message- From: Alexander Cougarman Sent: Wednesday, August 29, 2012 9:59 AM To: 'solr-user@lucene.apache.org' Subject: RE: Unexcpected RuntimeException whe

RE: Unexcpected RuntimeException when indexing with Solr 4.0 Beta

2012-08-29 Thread Alexander Cougarman
It may be possible for you to drop the old Tika 1.0 into Solr 4.0, but I wouldn't try to guarantee that. In any case, this should be filed in Jira as a bug in Solr 4.0-BETA (SolrCell/Extraction component). -- Jack Krupansky -Original Message- From: Alexander Cougarman Sent: Wednesday,

Unexcpected RuntimeException when indexing with Solr 4.0 Beta

2012-08-29 Thread Alexander Cougarman
Hi. I'm using Solr 4.0 Beta (no modifications to default installation) to index, and it's blowing up on some Word docs: curl "http://localhost:8983/solr/update/extract?literal.id=doc15&commit=true"; -F "myfile=@15.doc" Here's the exception. And the same files go through Solr 3.6.1 just fine.

RE: Two-dimensional array in Solr schema

2012-08-26 Thread Alexander Cougarman
More detail: So, when Solr returns results, we'd get XML that looks like this: dkjfkjdfkjdkfj kdjfkjdkfj Sincerely, Alex -Original Message- From: Alexander Cougarman [mailto:acoug...@bwc.org] Sent: 26 August 2012 10:00 AM To: solr-user@lucene.apach

Multi-core setup in Solr 4.0

2012-08-26 Thread Alexander Cougarman
Hi. I'm looking for a tutorial on how to set up two cores for a Solr 4.0 Beta instance. I've found this tutorial for earlier versions of Solr: http://drupal.org/node/484800 Also, what are "Collections" in Solr 4? Are they related to cores? Thanks. Sincerely, Alex

Two-dimensional array in Solr schema

2012-08-26 Thread Alexander Cougarman
Hi. We're using Solr 4.0 Beta. Is it possible to have a 2-dimensional array in Solr schema? For example, you want to store this information in a field: MyCustomField: - Text - FileName So each text has a filename associated with it. Thanks. Sincerely, Alex

RE: Can't extract Outlook message files

2012-08-25 Thread Alexander Cougarman
This is an issue with "extractOnly=true" on Solr 3.6.1. We upgraded to 4.0 Beta 2 and the problem went away. Just in case anyone runs into this. Sincerely, Alex -Original Message----- From: Alexander Cougarman [mailto:acoug...@bwc.org] Sent: 23 August 2012 12:27 PM To:

Can't extract Outlook message files

2012-08-23 Thread Alexander Cougarman
Hi. We're trying to use the following Curl command to perform an "extract only" of *.MSG file, but it blows up: curl "http://localhost:8983/solr/update/extract?extractOnly=true"; -F "myfile=@92.msg" If we do this, it works fine: curl "http://localhost:8983/solr/update/extract?literal.

RE: Use a different folder for schema.xml

2012-08-22 Thread Alexander Cougarman
Subject: Re: Use a different folder for schema.xml It is possible to store the entire conf/ directory somewhere. To store only the schema.xml file, try soft links or the XML include feature: conf/schema.xml includes from somewhere else. On Tue, Aug 21, 2012 at 11:31 PM, Alexander Cougarman wrote

Which directories are required in Solr?

2012-08-22 Thread Alexander Cougarman
Hi. Which folders/files can be deleted from the default Solr package (apache-solr-3.6.1.zip) on Windows if all we'd like to do is index/store documents? Thanks. Sincerely, Alex

Use a different folder for schema.xml

2012-08-21 Thread Alexander Cougarman
Hi. For our Solr instance, we need to put the schema.xml file in a different location than where it resides now. Is this possible? Thanks. Sincerely, Alex

RE: How to get raw text of a document

2012-08-18 Thread Alexander Cougarman
a document You need a "response writer" that returns only text. The "wt" paramter selects the response writer. You specified "json", so that's what you got. Maybe "csv" would be closer to what you want. -- Jack Krupansky -Original Message- From:

How to get raw text of a document

2012-08-17 Thread Alexander Cougarman
Hi. I asked this on the Tika group and the recommendation was to ask it here. I am using the following C# code to call Tika and would like it to return the raw text without any XML or JSON. So if the Word document contains "Hello World", this should return only that text and no XML or anything e

Wildcard searches in phrases throws exception

2012-08-07 Thread Alexander Cougarman
Hi, Is it possible to do wildcard searches on multiple words? Here's an example: We need to search on the words "Dearly loved friends" using this text:dearly * friends This blows up Solr with this exception. From my Googling, I see that the error has to do with too many tokens being creat

RE: Synonym file for American-British words

2012-08-07 Thread Alexander Cougarman
Sorry, the VarCon file is here: http://wordlist.sourceforge.net/ Sincerely, Alex -Original Message- From: Alexander Cougarman [mailto:acoug...@bwc.org] Sent: 7 August 2012 5:09 PM To: solr-user@lucene.apache.org Subject: Synonym file for American-British words Dear friends, Is there

Synonym file for American-British words

2012-08-07 Thread Alexander Cougarman
Dear friends, Is there a downloadable synonym file for American-British words? This page has some, for example the VarCon file, but it's not in the Solr synonym.txt file. We need something that can normalize words like "center" to "centre". The VarCon file has it, but it's in the wrong format.

Stemming questions

2012-08-07 Thread Alexander Cougarman
Dear friends, A few questions on stemming support in Solr 3.6.1: - Can you do non-English stemming? - We're using solr.PorterStemFilterFactory on the "text_en" field type. We will index a ton of PDF, DOCX, etc. docs in multiple languages. Is this the best filter factory to use for stemming?