Hello,
I have a file with the input string 91{40}9490949090, and I wanted to return
this file when I search for the query string +91?40?9*. The problem is that,
the input string is getting indexed as 3 terms 91, 40, 9490949090. Is
there a way to consider { and } as part of the string
Hello,
It is observed that TIKA does not extract the Content-Language for documents
encoded in UTF-8. For natively encoded documents, it works fine. Any idea on
how we can resolve this ?
Thanks,
Sandhya
Yes, I did. But, I don't find a solrj example there. The example in
the doc uses curl.
- Sent from iPhone
On 07-May-2010, at 8:12 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
: Sorry. That is what I meant. But, I put it wrongly. I have not been
: able to find examples of using
, 2010, at 3:28 AM, Sandhya Agarwal wrote:
Hello,
But I see that the libraries are being loaded :
INFO: Adding specified lib dirs to ClassLoader
May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/C:/apache-solr-1.4.0
Hello,
Can somebody please point me to an example, of how we can leverage
*stream.file* for streaming documents, using UpdateRequest API. (SolrJ API)
Thanks,
Sandhya
Sorry. That is what I meant. But, I put it wrongly. I have not been
able to find examples of using solrj, for this.
- Sent from iPhone
On 07-May-2010, at 1:23 AM, Chris Hostetter
hossman_luc...@fucit.org wrote:
: Subject: Example of using stream.file to post a binary file to
solr
Message-
From: Sandhya Agarwal [mailto:sagar...@opentext.com]
Sent: Wednesday, May 05, 2010 10:06 AM
To: solr-user@lucene.apache.org
Subject: RE: Problem with pdf, upgrading Cell
Praveen,
I only have the highlighted jars copied. Not sure, if we need the other
jars. Also, I
Hello,
But I see that the libraries are being loaded :
INFO: Adding specified lib dirs to ClassLoader
May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/asm-3.1.jar' to
classloader
May 4,
Yes, Grant. You are right. Copying the tika libraries to solr webapp, solved
the issue and the content extraction works fine now.
Thanks,
Sandhya
-Original Message-
From: Sandhya Agarwal [mailto:sagar...@opentext.com]
Sent: Tuesday, May 04, 2010 12:58 PM
To: solr-user@lucene.apache.org
-Original Message-
From: Sandhya Agarwal [mailto:sagar...@opentext.com]
Sent: Tuesday, May 04, 2010 1:10 PM
To: solr-user@lucene.apache.org
Subject: RE: Problem with pdf, upgrading Cell
Yes, Grant. You are right. Copying the tika libraries to solr webapp, solved
the issue and the content
PM
To: solr-user@lucene.apache.org
Subject: Re: Problem with pdf, upgrading Cell
Yes, it is loading the libraries, but they are in a different classloader that
apparently the new way Tika loads doesn't have access to.
-Grant
On May 4, 2010, at 3:28 AM, Sandhya Agarwal wrote:
Hello
Praveen,
Along with the tika core and parser jars, did you run mvn
dependency:copy-dependencies, to generate all the dependencies too.
Thanks,
Sandhya
-Original Message-
From: Praveen Agrawal [mailto:pkal...@gmail.com]
Sent: Tuesday, May 04, 2010 4:52 PM
To:
@lucene.apache.org
Subject: Re: Problem with pdf, upgrading Cell
Yes Sandhya,
i copied new poi/jempbox/pdfbox/fontbox etc jars too. I believe this is what
you were asking.
Thanks.
On Tue, May 4, 2010 at 5:01 PM, Sandhya Agarwal sagar...@opentext.comwrote:
Praveen,
Along with the tika core
,
i copied new poi/jempbox/pdfbox/fontbox etc jars too. I believe this is what
you were asking.
Thanks.
On Tue, May 4, 2010 at 5:01 PM, Sandhya Agarwal
sagar...@opentext.commailto:sagar...@opentext.com wrote:
Praveen,
Along with the tika core and parser jars, did you run mvn
dependency:copy
On Tue, May 4, 2010 at 5:28 PM, Sandhya Agarwal sagar...@opentext.comwrote:
Both the files work for me, Praveen.
Thanks,
Sandhya
From: Praveen Agrawal [mailto:pkal...@gmail.com]
Sent: Tuesday, May 04, 2010 5:22 PM
To: solr-user@lucene.apache.org
Subject: Re: Problem with pdf
.jar
metadata-extractor-2.4.0-beta-1.jar
pdfbox-1.1.0.jar
poi-3.6.jar
poi-ooxml-3.6.jar
poi-ooxml-schemas-3.6.jar
poi-scratchpad-3.6.jar
tagsoup-1.2.jar
tika-core-0.7.jar
tika-parsers-0.7.jar
xml-apis-1.0.b2.jar
xmlbeans-2.3.0.jar
Thanks,
Sandhya
-Original Message-
From: Sandhya Agarwal
Hello,
Please let me know if anybody figured out a way out of this issue.
Thanks,
Sandhya
-Original Message-
From: Praveen Agrawal [mailto:pkal...@gmail.com]
Sent: Friday, April 30, 2010 11:14 PM
To: solr-user@lucene.apache.org
Subject: Re: Problem with pdf, upgrading Cell
Grant,
You
I observed the same issue too, with tika 0.7 jars. It now fails to extract
content from documents of any type. Works with tika 0.5 though.
Thanks,
Sandhya
-Original Message-
From: pk [mailto:pkal...@gmail.com]
Sent: Friday, April 30, 2010 3:17 PM
To: solr-user@lucene.apache.org
@lucene.apache.org
Subject: Re: Indexing metadata in solr using ContentStreamUpdateRequest
What does your schema look like?
On Apr 30, 2010, at 3:47 AM, Sandhya Agarwal wrote:
Hello,
I am using ContentStreamUpdateRequest, to index binary documents. At the time
of indexing the content, I want to be able
Hello,
I see that solr 1.4 is bundled with tika 0.4, which does not do proper content
extraction of zip files. So, I replaced tika jars with the latest tika 0.7
jars. I still see an issue and the individual files in the zip file are not
being indexed. Any configuration I must do to get this
Hello,
Is it a problem if I use *copyField* for some fields and not for others. In my
query, I have both fields, the ones mentioned in copyField and ones that are
not copied to a common destination. Will this cause an anomaly in my search
results. I am seeing some weird behavior.
Thanks,
Hello,
I am confused about the proper usage of the Boolean operators, AND, OR and NOT.
Could somebody please provide me an easy to understand explanation.
Thanks,
Sandhya
Thank You Mitch.
I have a query mentioned below : (my defaultOperator is set to AND)
(field1 : This is a good string AND field2 : This is a good string AND field3 :
This is a good string AND (field4 : ASCIIDocument OR field4 : BinaryDocument OR
field4 : HTMLDocument) AND field5 : doc)
This is
Also, one of the fields here, *field3* is a dynamic field. All the other fields
except this field, are copied into text with copyField.
Thanks,
Sandhya
-Original Message-
From: Sandhya Agarwal [mailto:sagar...@opentext.com]
Sent: Monday, April 19, 2010 2:55 PM
To: solr-user
Thanks Erick. Using parentheses works.
With parentheses, the query,q=field1: (this is a good string) is parsed as
follows :
+field1:this +field1:good +field1:string
Is that ok to do.
Thanks,
Sandhya
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent:
Hello,
As I understand, we have to use the syntax { * TO value } or [ * TO value
], for queries less than value or less than or equal to value, etc;
Where value is a numeric field.
There is no direct value or = value syntax supported. Is that correct ?
Thanks,
Sandhya
To: solr-user@lucene.apache.org
Subject: Re: solr numeric range queries
On Apr 14, 2010, at 6:09 AM, Sandhya Agarwal wrote:
Hello,
As I understand, we have to use the syntax { * TO value } or [ *
TO value ], for queries less than value or less than or equal
to value, etc;
Where value
Hello,
We want to design a solution where we have one polling directory (data source
directory) containing the xml files, of all data that must be indexed. These
XML files contain a reference to the content file. So, we need another
datasource that must be created for the content files. Could
think)
FLEP walks the directory and supplies a separate record per file.
BFDS pulls the file and supplies it to TikaEntityProcessor.
BinFileDataSource is not documented, but you need it for binary data
streams like PDF Word. For text files, use FileDataSource.
On 4/14/10, Sandhya Agarwal sagar
or later, take a look at solr
trie range support.
http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/
Ankit
-Original Message-
From: Sandhya Agarwal [mailto:sagar...@opentext.com]
Sent: Wednesday, April 14, 2010 7:56 AM
To: solr-user
Hello,
I have the following piece of code :
ContentStreamUpdateRequest contentUpdateRequest = new
ContentStreamUpdateRequest(/update/extract);
contentUpdateRequest.addFile(new File(contentFileName));
contentUpdateRequest.setParam(extractOnly,true);
NamedList result =
andrea.gazzar...@atcult.it wrote:
Some problem with extraction (Tika, etc...)? My suggestion is : try to
extract manually the document...I had a lot of problem with Tika and pdf
extraction...
Cheers,
Andrea
Il 13/04/2010 13:05, Sandhya Agarwal ha scritto:
Hello,
I have the following piece
32 matches
Mail list logo