Regarding WordDelimiterFactory

2010-09-09 Thread Sandhya Agarwal
Hello, I have a file with the input string 91{40}9490949090, and I wanted to return this file when I search for the query string +91?40?9*. The problem is that, the input string is getting indexed as 3 terms 91, 40, 9490949090. Is there a way to consider { and } as part of the string

Tika language extraction

2010-06-10 Thread Sandhya Agarwal
Hello, It is observed that TIKA does not extract the Content-Language for documents encoded in UTF-8. For natively encoded documents, it works fine. Any idea on how we can resolve this ? Thanks, Sandhya

Re: Example of using stream.file to post a binary file to solr

2010-05-07 Thread Sandhya Agarwal
Yes, I did. But, I don't find a solrj example there. The example in the doc uses curl. - Sent from iPhone On 07-May-2010, at 8:12 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Sorry. That is what I meant. But, I put it wrongly. I have not been : able to find examples of using

RE: Problem with pdf, upgrading Cell

2010-05-06 Thread Sandhya Agarwal
, 2010, at 3:28 AM, Sandhya Agarwal wrote: Hello, But I see that the libraries are being loaded : INFO: Adding specified lib dirs to ClassLoader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/apache-solr-1.4.0

Example of using stream.file to post a binary file to solr

2010-05-06 Thread Sandhya Agarwal
Hello, Can somebody please point me to an example, of how we can leverage *stream.file* for streaming documents, using UpdateRequest API. (SolrJ API) Thanks, Sandhya

Re: Example of using stream.file to post a binary file to solr

2010-05-06 Thread Sandhya Agarwal
Sorry. That is what I meant. But, I put it wrongly. I have not been able to find examples of using solrj, for this. - Sent from iPhone On 07-May-2010, at 1:23 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : Subject: Example of using stream.file to post a binary file to solr

RE: Problem with pdf, upgrading Cell

2010-05-05 Thread Sandhya Agarwal
Message- From: Sandhya Agarwal [mailto:sagar...@opentext.com] Sent: Wednesday, May 05, 2010 10:06 AM To: solr-user@lucene.apache.org Subject: RE: Problem with pdf, upgrading Cell Praveen, I only have the highlighted jars copied. Not sure, if we need the other jars. Also, I

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
Hello, But I see that the libraries are being loaded : INFO: Adding specified lib dirs to ClassLoader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/asm-3.1.jar' to classloader May 4,

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
Yes, Grant. You are right. Copying the tika libraries to solr webapp, solved the issue and the content extraction works fine now. Thanks, Sandhya -Original Message- From: Sandhya Agarwal [mailto:sagar...@opentext.com] Sent: Tuesday, May 04, 2010 12:58 PM To: solr-user@lucene.apache.org

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
-Original Message- From: Sandhya Agarwal [mailto:sagar...@opentext.com] Sent: Tuesday, May 04, 2010 1:10 PM To: solr-user@lucene.apache.org Subject: RE: Problem with pdf, upgrading Cell Yes, Grant. You are right. Copying the tika libraries to solr webapp, solved the issue and the content

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
PM To: solr-user@lucene.apache.org Subject: Re: Problem with pdf, upgrading Cell Yes, it is loading the libraries, but they are in a different classloader that apparently the new way Tika loads doesn't have access to. -Grant On May 4, 2010, at 3:28 AM, Sandhya Agarwal wrote: Hello

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
Praveen, Along with the tika core and parser jars, did you run mvn dependency:copy-dependencies, to generate all the dependencies too. Thanks, Sandhya -Original Message- From: Praveen Agrawal [mailto:pkal...@gmail.com] Sent: Tuesday, May 04, 2010 4:52 PM To:

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
@lucene.apache.org Subject: Re: Problem with pdf, upgrading Cell Yes Sandhya, i copied new poi/jempbox/pdfbox/fontbox etc jars too. I believe this is what you were asking. Thanks. On Tue, May 4, 2010 at 5:01 PM, Sandhya Agarwal sagar...@opentext.comwrote: Praveen, Along with the tika core

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
, i copied new poi/jempbox/pdfbox/fontbox etc jars too. I believe this is what you were asking. Thanks. On Tue, May 4, 2010 at 5:01 PM, Sandhya Agarwal sagar...@opentext.commailto:sagar...@opentext.com wrote: Praveen, Along with the tika core and parser jars, did you run mvn dependency:copy

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
On Tue, May 4, 2010 at 5:28 PM, Sandhya Agarwal sagar...@opentext.comwrote: Both the files work for me, Praveen. Thanks, Sandhya From: Praveen Agrawal [mailto:pkal...@gmail.com] Sent: Tuesday, May 04, 2010 5:22 PM To: solr-user@lucene.apache.org Subject: Re: Problem with pdf

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
.jar metadata-extractor-2.4.0-beta-1.jar pdfbox-1.1.0.jar poi-3.6.jar poi-ooxml-3.6.jar poi-ooxml-schemas-3.6.jar poi-scratchpad-3.6.jar tagsoup-1.2.jar tika-core-0.7.jar tika-parsers-0.7.jar xml-apis-1.0.b2.jar xmlbeans-2.3.0.jar Thanks, Sandhya -Original Message- From: Sandhya Agarwal

RE: Problem with pdf, upgrading Cell

2010-05-03 Thread Sandhya Agarwal
Hello, Please let me know if anybody figured out a way out of this issue. Thanks, Sandhya -Original Message- From: Praveen Agrawal [mailto:pkal...@gmail.com] Sent: Friday, April 30, 2010 11:14 PM To: solr-user@lucene.apache.org Subject: Re: Problem with pdf, upgrading Cell Grant, You

RE: Problem with pdf, upgrading Cell

2010-04-30 Thread Sandhya Agarwal
I observed the same issue too, with tika 0.7 jars. It now fails to extract content from documents of any type. Works with tika 0.5 though. Thanks, Sandhya -Original Message- From: pk [mailto:pkal...@gmail.com] Sent: Friday, April 30, 2010 3:17 PM To: solr-user@lucene.apache.org

RE: Indexing metadata in solr using ContentStreamUpdateRequest

2010-04-30 Thread Sandhya Agarwal
@lucene.apache.org Subject: Re: Indexing metadata in solr using ContentStreamUpdateRequest What does your schema look like? On Apr 30, 2010, at 3:47 AM, Sandhya Agarwal wrote: Hello, I am using ContentStreamUpdateRequest, to index binary documents. At the time of indexing the content, I want to be able

Indexing zip files

2010-04-27 Thread Sandhya Agarwal
Hello, I see that solr 1.4 is bundled with tika 0.4, which does not do proper content extraction of zip files. So, I replaced tika jars with the latest tika 0.7 jars. I still see an issue and the individual files in the zip file are not being indexed. Any configuration I must do to get this

Query regarding copyField

2010-04-19 Thread Sandhya Agarwal
Hello, Is it a problem if I use *copyField* for some fields and not for others. In my query, I have both fields, the ones mentioned in copyField and ones that are not copied to a common destination. Will this cause an anomaly in my search results. I am seeing some weird behavior. Thanks,

Help using boolean operators

2010-04-19 Thread Sandhya Agarwal
Hello, I am confused about the proper usage of the Boolean operators, AND, OR and NOT. Could somebody please provide me an easy to understand explanation. Thanks, Sandhya

RE: Help using boolean operators

2010-04-19 Thread Sandhya Agarwal
Thank You Mitch. I have a query mentioned below : (my defaultOperator is set to AND) (field1 : This is a good string AND field2 : This is a good string AND field3 : This is a good string AND (field4 : ASCIIDocument OR field4 : BinaryDocument OR field4 : HTMLDocument) AND field5 : doc) This is

RE: Help using boolean operators

2010-04-19 Thread Sandhya Agarwal
Also, one of the fields here, *field3* is a dynamic field. All the other fields except this field, are copied into text with copyField. Thanks, Sandhya -Original Message- From: Sandhya Agarwal [mailto:sagar...@opentext.com] Sent: Monday, April 19, 2010 2:55 PM To: solr-user

RE: Help using boolean operators

2010-04-19 Thread Sandhya Agarwal
Thanks Erick. Using parentheses works. With parentheses, the query,q=field1: (this is a good string) is parsed as follows : +field1:this +field1:good +field1:string Is that ok to do. Thanks, Sandhya -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent:

solr numeric range queries

2010-04-14 Thread Sandhya Agarwal
Hello, As I understand, we have to use the syntax { * TO value } or [ * TO value ], for queries less than value or less than or equal to value, etc; Where value is a numeric field. There is no direct value or = value syntax supported. Is that correct ? Thanks, Sandhya

RE: solr numeric range queries

2010-04-14 Thread Sandhya Agarwal
To: solr-user@lucene.apache.org Subject: Re: solr numeric range queries On Apr 14, 2010, at 6:09 AM, Sandhya Agarwal wrote: Hello, As I understand, we have to use the syntax { * TO value } or [ * TO value ], for queries less than value or less than or equal to value, etc; Where value

DIH

2010-04-14 Thread Sandhya Agarwal
Hello, We want to design a solution where we have one polling directory (data source directory) containing the xml files, of all data that must be indexed. These XML files contain a reference to the content file. So, we need another datasource that must be created for the content files. Could

RE: DIH

2010-04-14 Thread Sandhya Agarwal
think) FLEP walks the directory and supplies a separate record per file. BFDS pulls the file and supplies it to TikaEntityProcessor. BinFileDataSource is not documented, but you need it for binary data streams like PDF Word. For text files, use FileDataSource. On 4/14/10, Sandhya Agarwal sagar

RE: solr numeric range queries

2010-04-14 Thread Sandhya Agarwal
or later, take a look at solr trie range support. http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/ Ankit -Original Message- From: Sandhya Agarwal [mailto:sagar...@opentext.com] Sent: Wednesday, April 14, 2010 7:56 AM To: solr-user

Internal Server Error

2010-04-13 Thread Sandhya Agarwal
Hello, I have the following piece of code : ContentStreamUpdateRequest contentUpdateRequest = new ContentStreamUpdateRequest(/update/extract); contentUpdateRequest.addFile(new File(contentFileName)); contentUpdateRequest.setParam(extractOnly,true); NamedList result =

RE: Internal Server Error

2010-04-13 Thread Sandhya Agarwal
andrea.gazzar...@atcult.it wrote: Some problem with extraction (Tika, etc...)? My suggestion is : try to extract manually the document...I had a lot of problem with Tika and pdf extraction... Cheers, Andrea Il 13/04/2010 13:05, Sandhya Agarwal ha scritto: Hello, I have the following piece