RE: Problem with pdf, upgrading Cell

2010-05-11 Thread Marc Ghorayeb
Great news, thanks :) Marc _ Vous voulez regarder la TV directement depuis votre PC ? C'est très simple avec Windows 7 http://clk.atdmt.com/FRM/go/229960614/direct/01/

Re: Problem with pdf, upgrading Cell

2010-05-10 Thread Grant Ingersoll
, Sandhya -Original Message- From: Praveen Agrawal [mailto:pkal...@gmail.com] Sent: Wednesday, May 05, 2010 10:49 PM To: solr-user@lucene.apache.org Subject: Re: Problem with pdf, upgrading Cell It reports that Jukka has resolved the issue (Tika-419), and now waiting for Grant to verify

RE: Problem with pdf, upgrading Cell

2010-05-06 Thread Sandhya Agarwal
: Re: Problem with pdf, upgrading Cell It reports that Jukka has resolved the issue (Tika-419), and now waiting for Grant to verify (Solr-1902). But it seems the resolution will be available in 0.8 version of Tika. If it solves the problem, Is there a way to get it now? Any SVN trunk access etc

RE: Problem with pdf, upgrading Cell

2010-05-05 Thread Marc Ghorayeb
-Original Message- From: Sandhya Agarwal [mailto:sagar...@opentext.com] Sent: Wednesday, May 05, 2010 10:06 AM To: solr-user@lucene.apache.org Subject: RE: Problem with pdf, upgrading Cell Praveen, I only have the highlighted jars copied. Not sure, if we need the other jars. Also

Re: Problem with pdf, upgrading Cell

2010-05-05 Thread Praveen Agrawal
to it the extraction library (apache solr cell jar), though you might not need it specifically inside the war file. Marc From: sagar...@opentext.com To: solr-user@lucene.apache.org Date: Wed, 5 May 2010 10:21:36 +0530 Subject: RE: Problem with pdf, upgrading Cell Looks like the highlighting may

RE: Problem with pdf, upgrading Cell

2010-05-05 Thread Marc Ghorayeb
Praveen, I am indeed using a trunk version from last week's svn i think. You could always try a version from the hudson builds. I did not try this procedure with Solr's 1.4 release though. Marc

RE: Problem with pdf, upgrading Cell

2010-05-05 Thread Sandhya Agarwal
added to it the extraction library (apache solr cell jar), though you might not need it specifically inside the war file. Marc From: sagar...@opentext.com To: solr-user@lucene.apache.org Date: Wed, 5 May 2010 10:21:36 +0530 Subject: RE: Problem with pdf, upgrading Cell Looks like

Re: Problem with pdf, upgrading Cell

2010-05-05 Thread Praveen Agrawal
Of Grant Ingersoll Sent: Tuesday, May 04, 2010 6:13 AM Cc: solr-user@lucene.apache.org Subject: Re: Problem with pdf, upgrading Cell Little more info... Seems to be a classloading issue. The tests pass, but they aren't loading the Tika libraries via the Solr ResourceLoader, whereas

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
-Original Message- From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant Ingersoll Sent: Tuesday, May 04, 2010 6:13 AM Cc: solr-user@lucene.apache.org Subject: Re: Problem with pdf, upgrading Cell Little more info... Seems to be a classloading issue. The tests pass, but they aren't

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
Subject: RE: Problem with pdf, upgrading Cell Hello, But I see that the libraries are being loaded : INFO: Adding specified lib dirs to ClassLoader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Marc Ghorayeb
May 2010 13:10:25 +0530 Subject: RE: Problem with pdf, upgrading Cell Yes, Grant. You are right. Copying the tika libraries to solr webapp, solved the issue and the content extraction works fine now. Thanks, Sandhya -Original Message- From: Sandhya Agarwal [mailto:sagar

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
-Original Message- From: Sandhya Agarwal [mailto:sagar...@opentext.com] Sent: Tuesday, May 04, 2010 1:10 PM To: solr-user@lucene.apache.org Subject: RE: Problem with pdf, upgrading Cell Yes, Grant. You are right. Copying the tika libraries to solr webapp, solved the issue and the content

Re: Problem with pdf, upgrading Cell

2010-05-04 Thread Praveen Agrawal
: sagar...@opentext.com To: solr-user@lucene.apache.org Date: Tue, 4 May 2010 13:10:25 +0530 Subject: RE: Problem with pdf, upgrading Cell Yes, Grant. You are right. Copying the tika libraries to solr webapp, solved the issue and the content extraction works fine now. Thanks, Sandhya

Re: Problem with pdf, upgrading Cell

2010-05-04 Thread Grant Ingersoll
Subject: Re: Problem with pdf, upgrading Cell Little more info... Seems to be a classloading issue. The tests pass, but they aren't loading the Tika libraries via the Solr ResourceLoader, whereas the example is. Marc, one thing to try is to unjar the Solr WAR file and put the Tika

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
PM To: solr-user@lucene.apache.org Subject: Re: Problem with pdf, upgrading Cell Yes, it is loading the libraries, but they are in a different classloader that apparently the new way Tika loads doesn't have access to. -Grant On May 4, 2010, at 3:28 AM, Sandhya Agarwal wrote: Hello

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Marc Ghorayeb
Hey, I got it to work. I just redid my steps, i had forgotten several libraries that were imported through the xml. PDF extraction seems to work once again, i have yet to find one that raises an exception! Thanks for the investigation, at least we now have a fix :) Marc

Re: Problem with pdf, upgrading Cell

2010-05-04 Thread Praveen Agrawal
I seems to have mixed results: Here is what i did: copied new Tika/poi/jempbox/pdfbox/fontbox/log4j jars etc in contrib/extraction/lib (of-course removed old ones),. as well as in web-inf/lib of solr web app in tomcat. Now it extracts contents from some pdf, but either no content from others, or

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
@lucene.apache.org Subject: Re: Problem with pdf, upgrading Cell I seems to have mixed results: Here is what i did: copied new Tika/poi/jempbox/pdfbox/fontbox/log4j jars etc in contrib/extraction/lib (of-course removed old ones),. as well as in web-inf/lib of solr web app in tomcat. Now it extracts contents from

Re: Problem with pdf, upgrading Cell

2010-05-04 Thread Praveen Agrawal
, to generate all the dependencies too. Thanks, Sandhya -Original Message- From: Praveen Agrawal [mailto:pkal...@gmail.com] Sent: Tuesday, May 04, 2010 4:52 PM To: solr-user@lucene.apache.org Subject: Re: Problem with pdf, upgrading Cell I seems to have mixed results: Here is what i

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
@lucene.apache.org Subject: Re: Problem with pdf, upgrading Cell Yes Sandhya, i copied new poi/jempbox/pdfbox/fontbox etc jars too. I believe this is what you were asking. Thanks. On Tue, May 4, 2010 at 5:01 PM, Sandhya Agarwal sagar...@opentext.comwrote: Praveen, Along with the tika core

Re: Problem with pdf, upgrading Cell

2010-05-04 Thread Praveen Agrawal
and parser jars, did you run mvn dependency:copy-dependencies, to generate all the dependencies too. Thanks, Sandhya -Original Message- From: Praveen Agrawal [mailto:pkal...@gmail.com] Sent: Tuesday, May 04, 2010 4:52 PM To: solr-user@lucene.apache.org Subject: Re: Problem with pdf

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
Both the files work for me, Praveen. Thanks, Sandhya From: Praveen Agrawal [mailto:pkal...@gmail.com] Sent: Tuesday, May 04, 2010 5:22 PM To: solr-user@lucene.apache.org Subject: Re: Problem with pdf, upgrading Cell another one here.. On Tue, May 4, 2010 at 5:20 PM, Praveen Agrawal pkal

Re: Problem with pdf, upgrading Cell

2010-05-04 Thread Praveen Agrawal
] Sent: Tuesday, May 04, 2010 4:52 PM To: solr-user@lucene.apache.org Subject: Re: Problem with pdf, upgrading Cell I seems to have mixed results: Here is what i did: copied new Tika/poi/jempbox/pdfbox/fontbox/log4j jars etc in contrib/extraction/lib (of-course removed old ones),. as well

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Marc Ghorayeb
if i am wrong here :) Marc Date: Tue, 4 May 2010 11:58:56 + From: pkal...@gmail.com To: solr-user@lucene.apache.org Subject: Re: Problem with pdf, upgrading Cell This email contained a .zip file attachment. Raytheon does not allow email attachments that are considered likely

Re: Problem with pdf, upgrading Cell

2010-05-04 Thread Praveen Agrawal
, Sandhya From: Praveen Agrawal [mailto:pkal...@gmail.com] Sent: Tuesday, May 04, 2010 5:22 PM To: solr-user@lucene.apache.org Subject: Re: Problem with pdf, upgrading Cell another one here.. On Tue, May 4, 2010 at 5:20 PM, Praveen Agrawal pkal...@gmail.commailto: pkal...@gmail.com wrote

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
To: solr-user@lucene.apache.org Subject: Re: Problem with pdf, upgrading Cell Hi Sandhya.. I must be missing something. I copied all dependencies jars to both contrib/extraction/lib and web-in/lib folders. Here is the list of jars copied: asm-3.1.jar bcmail-jdk15-1.45.jar bcprov-jdk15-1.45

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
[mailto:sagar...@opentext.com] Sent: Wednesday, May 05, 2010 10:06 AM To: solr-user@lucene.apache.org Subject: RE: Problem with pdf, upgrading Cell Praveen, I only have the highlighted jars copied. Not sure, if we need the other jars. Also, I copied the jars directly into solr\WEB-INF\lib

RE: Problem with pdf, upgrading Cell

2010-05-03 Thread Sandhya Agarwal
Hello, Please let me know if anybody figured out a way out of this issue. Thanks, Sandhya -Original Message- From: Praveen Agrawal [mailto:pkal...@gmail.com] Sent: Friday, April 30, 2010 11:14 PM To: solr-user@lucene.apache.org Subject: Re: Problem with pdf, upgrading Cell Grant, You

RE: Problem with pdf, upgrading Cell

2010-05-03 Thread Marc Ghorayeb
Hi, Grant, i confirm what Praveen has said, any PDF i try does not work with the new Tika and SVN versions. :( Marc From: sagar...@opentext.com To: solr-user@lucene.apache.org Date: Mon, 3 May 2010 13:05:24 +0530 Subject: RE: Problem with pdf, upgrading Cell Hello, Please let me know

Re: Problem with pdf, upgrading Cell

2010-05-03 Thread Grant Ingersoll
Subject: RE: Problem with pdf, upgrading Cell Hello, Please let me know if anybody figured out a way out of this issue. Thanks, Sandhya -Original Message- From: Praveen Agrawal [mailto:pkal...@gmail.com] Sent: Friday, April 30, 2010 11:14 PM To: solr-user@lucene.apache.org

Re: Problem with pdf, upgrading Cell

2010-05-03 Thread Grant Ingersoll
To: solr-user@lucene.apache.org Date: Mon, 3 May 2010 13:05:24 +0530 Subject: RE: Problem with pdf, upgrading Cell Hello, Please let me know if anybody figured out a way out of this issue. Thanks, Sandhya -Original Message- From: Praveen Agrawal [mailto:pkal...@gmail.com] Sent

Re: Problem with pdf, upgrading Cell

2010-05-03 Thread Grant Ingersoll
with the new Tika and SVN versions. :( Marc From: sagar...@opentext.com To: solr-user@lucene.apache.org Date: Mon, 3 May 2010 13:05:24 +0530 Subject: RE: Problem with pdf, upgrading Cell Hello, Please let me know if anybody figured out a way out of this issue. Thanks, Sandhya

RE: Problem with pdf, upgrading Cell

2010-04-30 Thread pk
! It now even doesn't extract content from pdfs which it was able to earlier (v0.4). Strange.. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-pdf-upgrading-Cell-tp745557p767447.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: Problem with pdf, upgrading Cell

2010-04-30 Thread Sandhya Agarwal
: RE: Problem with pdf, upgrading Cell Mark, did you managed to get it work? I did try latest Tika (0.7) command line and successfully parsed earlier problematic pdf. Then i replaced Tika related jars in Solr-1.4 contrib/extraction/lib folder with new ones. Now it doesn;t throw any exception

Re: Problem with pdf, upgrading Cell

2010-04-30 Thread Grant Ingersoll
? Marc From: dekay...@hotmail.com To: solr-user@lucene.apache.org Subject: RE: Problem with pdf, upgrading Cell Date: Fri, 23 Apr 2010 16:03:28 +0200 Seems like i'm not the only one with this no extraction problem:http://www.mail-archive.com/solr-user@lucene.apache.org/msg33609

Re: Problem with pdf, upgrading Cell

2010-04-30 Thread Praveen Agrawal
not receive the ContentStream (it is set to null...).Maybe i should send this to the developper mailing list? Marc From: dekay...@hotmail.com To: solr-user@lucene.apache.org Subject: RE: Problem with pdf, upgrading Cell Date: Fri, 23 Apr 2010 16:03:28 +0200 Seems like i'm not the only

Re: Problem with pdf, upgrading Cell

2010-04-30 Thread Marc Ghorayeb
Hi Nope i didn't get it to work... Just like you, command line version of tika extracts correctly the content, but once included in Solr, no content is extracted. What i tried until now is:- Updating the tika libraries inside Solr 1.4 public version, no luck there.- Downloading the latest SVN

Re: Problem with pdf, upgrading Cell

2010-04-30 Thread Grant Ingersoll
Praveen and Marc, Can you share the PDF (feel free to email my private email) that fails in Solr? Thanks, Grant On Apr 30, 2010, at 7:55 AM, Marc Ghorayeb wrote: Hi Nope i didn't get it to work... Just like you, command line version of tika extracts correctly the content, but once

Re: Problem with pdf, upgrading Cell

2010-04-30 Thread Praveen Agrawal
Grant, You can try any of the sample pdfs that come in /docs folder of Solr 1.4 dist'n. I had tried 'Installing Solr in Tomcat.pdf', 'index.pdf' etc. Only metadata i.e. stream_size, content_type apart from my own literals are indexed, and content is missing.. On Fri, Apr 30, 2010 at 8:52 PM,

RE: Problem with pdf, upgrading Cell

2010-04-26 Thread Marc Ghorayeb
To: solr-user@lucene.apache.org Subject: RE: Problem with pdf, upgrading Cell Date: Fri, 23 Apr 2010 16:03:28 +0200 Seems like i'm not the only one with this no extraction problem:http://www.mail-archive.com/solr-user@lucene.apache.org/msg33609.htmlApparently he tried the same thing

Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb
Hello, I configured a Solr server to be able to extract data from various documents, including pdfs. Unfortunately, the data extraction fails on several pdfs. I have read around here that this may be due to the old Tika library being used?I looked around and saw that the svn had a newer

Re: Problem with pdf, upgrading Cell

2010-04-23 Thread Otis Gospodnetic
AM Subject: Problem with pdf, upgrading Cell Hello, I configured a Solr server to be able to extract data from various documents, including pdfs. Unfortunately, the data extraction fails on several pdfs. I have read around here that this may be due to the old Tika library being used?I

RE: Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb
: otis_gospodne...@yahoo.com Subject: Re: Problem with pdf, upgrading Cell To: solr-user@lucene.apache.org Marc, got anything in your logs? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message

RE: Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb
To: solr-user@lucene.apache.org Subject: RE: Problem with pdf, upgrading Cell Date: Fri, 23 Apr 2010 15:12:39 +0200 I'm launching it with the start.jar utility, and there doesn't seem to be anything weird inside the console when i upload a pdf. Is there a way to output the console to a log

RE: Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb
Seems like i'm not the only one with this no extraction problem:http://www.mail-archive.com/solr-user@lucene.apache.org/msg33609.htmlApparently he tried the same thing, building from the trunk, and indexing a pdf, and no extraction occured... Strange. Marc G.

Re: Problem with pdf, upgrading Cell

2010-04-23 Thread Otis Gospodnetic
@lucene.apache.org Sent: Fri, April 23, 2010 9:12:39 AM Subject: RE: Problem with pdf, upgrading Cell I'm launching it with the start.jar utility, and there doesn't seem to be anything weird inside the console when i upload a pdf. Is there a way to output the console to a log file? The only log file

RE: Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb
=truemaxSegments=1waitFlush=truewt=javabinversion=1} status=0 QTime=46 Date: Fri, 23 Apr 2010 08:03:14 -0700 From: otis_gospodne...@yahoo.com Subject: Re: Problem with pdf, upgrading Cell To: solr-user@lucene.apache.org Marc, These are your request logs. You want to look at your Solr logs

Re: Problem with pdf, upgrading Cell

2010-04-23 Thread Paul Borgermans
On Fri, Apr 23, 2010 at 5:48 PM, Marc Ghorayeb dekay...@hotmail.com wrote: Yes, the only log i can actually get is the one in the command console from windows and there are no errors there ... Here are the last lines when i upload a pdf to the update/extract url: snip I am pretty sure it is