RE: Spellcheck help

2010-07-27 Thread Marc Ghorayeb

Thanks for the input, i'll check it out!
Marc

 Subject: RE: Spellcheck help
 Date: Fri, 23 Jul 2010 13:12:04 -0500
 From: james.d...@ingrambook.com
 To: solr-user@lucene.apache.org
 
 In org.apache.solr.spelling.SpellingQueryConverter, find the line (#84):
 
 final static String PATTERN = (?:(?!( + NMTOKEN + 
 :|\\d+)))[\\p{L}_\\-0-9]+;
 
 and remove the |\\d+ to make it:
 
 final static String PATTERN = (?:(?! + NMTOKEN + :))[\\p{L}_\\-0-9]+;
 
 My testing shows this solves your problem.  The caution is to test it against 
 all your use cases because obviously someone thought we should ignore leading 
 digits from keywords.  Surely there's a reason why although I can't think of 
 it.
 
 James Dyer
 E-Commerce Systems
 Ingram Book Company
 (615) 213-4311
 
 -Original Message-
 From: dekay...@hotmail.com [mailto:dekay...@hotmail.com] 
 Sent: Saturday, July 17, 2010 12:41 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Spellcheck help
 
 Can anybody help me with this? :(
 
 -Original Message- 
 From: Marc Ghorayeb
 Sent: Thursday, July 08, 2010 9:46 AM
 To: solr-user@lucene.apache.org
 Subject: Spellcheck help
 
 
 Hello,I've been trying to get rid of a bug when using the spellcheck but so 
 far with no success :(When searching for a word that starts with a number, 
 for example 3dsmax, i get the results that i want, BUT the spellcheck says 
 it is not correctly spelled AND the collation gives me 33dsmax. Further 
 investigation shows that the spellcheck is actually only checking dsmax 
 which it considers does not exist and gives me 3dsmax for better results, 
 but since i have spellcheck.collate = true, the collation that i show is 
 33dsmax with the first 3 being the one discarded by the spellchecker... 
 Otherwise, the spellcheck works correctly for normal words... any ideas? 
 :(My spellcheck field is fairly classic, whitespace tokenizer, with 
 lowercase filter...Any help would be greatly appreciated :)Thanks,Marc
 _
 Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement !
 http://www.messengersurvotremobile.com/?d=iPhone 
 
  
_
Exclu : Téléchargez la nouvelle version de Messenger !
http://clk.atdmt.com/FRM/go/244627952/direct/01/

Spellcheck help

2010-07-08 Thread Marc Ghorayeb

Hello,I've been trying to get rid of a bug when using the spellcheck but so far 
with no success :(When searching for a word that starts with a number, for 
example 3dsmax, i get the results that i want, BUT the spellcheck says it is 
not correctly spelled AND the collation gives me 33dsmax. Further 
investigation shows that the spellcheck is actually only checking dsmax which 
it considers does not exist and gives me 3dsmax for better results, but since 
i have spellcheck.collate = true, the collation that i show is 33dsmax with 
the first 3 being the one discarded by the spellchecker... Otherwise, the 
spellcheck works correctly for normal words... any ideas? :(My spellcheck field 
is fairly classic, whitespace tokenizer, with lowercase filter...Any help would 
be greatly appreciated :)Thanks,Marc
_
Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement !
http://www.messengersurvotremobile.com/?d=iPhone

Strange query behavior

2010-06-28 Thread Marc Ghorayeb

Hello,
I have a title that says 3DVIA Studio amp; Virtools Maya and 3dsMax 
Exporters. The analysis tool for this field gives me these 
tokens:3dviadviastudio;virtoolmaya3dsmaxdssystèmmaxexport


However, when i search for 3dsmax, i get no results :( Furthermore, if i 
search for dsmax i get the spellchecker that suggests me 3dsmax even though 
it doesn't find any results. If i search for any other token (3dvia, or max 
for example), the document is found. 3dsmax is the only token that doesn't 
seem to work!! :(
Here is my schema for this field:fieldType name=text class=solr.TextField 
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/

filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
catenateWords=0
catenateNumbers=0
catenateAll=0
splitOnCaseChange=1
preserveOriginal=1
/

filter class=solr.TrimFilterFactory updateOffsets=true/
filter class=solr.LengthFilterFactory min=2 max=15/ 
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /   filter 
class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true 
expand=true/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
language=${Language} protected=protwords.txt/
/analyzer

analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory /

filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
catenateWords=1
catenateNumbers=1
catenateAll=0
splitOnCaseChange=1
preserveOriginal=1
/

filter class=solr.TrimFilterFactory updateOffsets=true/
filter class=solr.LengthFilterFactory min=2 max=15/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
filter class=solr.SnowballPorterFilterFactory 
language=${Language} protected=protwords.txt /
/analyzer
/fieldType
Can anyone help me out please? :(
PS: the ${Language} is set to en (for english) in this case...
  
_
La boîte mail NOW Génération vous permet de réunir toutes vos boîtes mail dans 
Hotmail !
http://www.windowslive.fr/hotmail/nowgeneration/

RE: Copyfield multi valued to single value

2010-06-15 Thread Marc Ghorayeb

Thanks for the update, i'll have to find another way then :s.
Marc

 Date: Mon, 14 Jun 2010 13:44:30 -0700
 From: hossman_luc...@fucit.org
 To: solr-user@lucene.apache.org
 Subject: Re: Copyfield multi valued to single value
 
 
 : Is there a way to copy a multivalued field to a single value by taking 
 : for example the first index of the multivalued field?
 
 Unfortunately no.  This would either need to be done with an 
 UpdateProcessor, or on the client constructing hte doc (either the remote 
 client, or in your DIH config if that's how you are using Tika)
 
 
 
 -Hoss
 
  
_
Installez gratuitement les nouvelles Emoch'ticones !
http://www.ilovemessenger.fr/emoticones/telecharger-emoticones-emochticones.aspx

Copyfield multi valued to single value

2010-06-09 Thread Marc Ghorayeb

Hello,
Is there a way to copy a multivalued field to a single value by taking for 
example the first index of the multivalued field?
I am actually trying to sort my index by Title and my index contains Tika 
extracted titles which come in as multi valued hence why my title field is 
multi valued. However when i do a sort on the title field, it crashes because 
well it cannot compare two arrays i guess which is logical. So my thought was 
to copy only one value from the array to another field.
Maybe there is another way to do that? Can anyone help me?
Thanks in advance!
Marc  
_
Vous voulez regarder la TV directement depuis votre PC ? C'est très simple avec 
Windows 7
http://clk.atdmt.com/FRM/go/229960614/direct/01/

RE: Problem with pdf, upgrading Cell

2010-05-11 Thread Marc Ghorayeb

Great news, thanks :)
Marc  
_
Vous voulez regarder la TV directement depuis votre PC ? C'est très simple avec 
Windows 7
http://clk.atdmt.com/FRM/go/229960614/direct/01/

RE: Problem with pdf, upgrading Cell

2010-05-05 Thread Marc Ghorayeb
 
  mailto:sagar...@opentext.com wrote:
 
  Praveen,
 
 
 
  Along with the tika core and parser jars, did you run mvn
 
  dependency:copy-dependencies, to generate all the dependencies too.
 
 
 
  Thanks,
 
  Sandhya
 
 
 
  -Original Message-
 
  From: Praveen Agrawal [mailto:pkal...@gmail.commailto:pkal...@gmail.com]
 
  Sent: Tuesday, May 04, 2010 4:52 PM
 
  To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
 
  Subject: Re: Problem with pdf, upgrading Cell
 
  I seems to have mixed results:
 
 
 
  Here is what i did:
 
  copied new Tika/poi/jempbox/pdfbox/fontbox/log4j jars etc in
 
  contrib/extraction/lib (of-course removed old ones),. as well as in
 
  web-inf/lib of solr web app in tomcat.
 
 
 
  Now it extracts contents from some pdf, but either no content from others,
 
  or only a line of content. For ex, /docs/Installing Solr in Tomcat.pdf
 
  still shows no contents. I've two other pdfs, for which it extracts only
 
  one
 
  line of content.
 
 
 
  Also, now i;m getting a field 'title' single value for some pdfs, and two
 
  for others. In case where it can extract full content, it shows title as
 
  what i gave as literal while submitting the pdf. For pdf wher no comtent
 
  was
 
  extracted, it shows one empty title and one mine. For pdf where it
 
  extracted
 
  only one line of content, it shows that line as title too and mine one.
 
  'title' field is defined as multivalue in schema.
 
 
 
  Any idea, whats going on? or am i missing something?
 
 
 
 
 
 
 
  On Tue, May 4, 2010 at 4:13 PM, Marc Ghorayeb dekay...@hotmail.com
 
  mailto:dekay...@hotmail.com wrote:
 
 
 
  
 
   Hey,
 
   I got it to work. I just redid my steps, i had forgotten several
 
  libraries
 
   that were imported through the xml. PDF extraction seems to work once
 
  again,
 
   i have yet to find one that raises an exception!
 
  
 
   Thanks for the investigation, at least we now have a fix :)
 
   Marc
 
   _
 
   Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone,
 
   Blackberry, …
 
   http://www.messengersurvotremobile.com/?d=Hotmail
 
  
 
 
 
 
 
 
 
 
 
 
  
_
Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone, 
Blackberry, …
http://www.messengersurvotremobile.com/?d=Hotmail

RE: Problem with pdf, upgrading Cell

2010-05-05 Thread Marc Ghorayeb

Praveen,
I am indeed using a trunk version from last week's svn i think. You could 
always try a version from the hudson builds. I did not try this procedure with 
Solr's 1.4 release though.

Marc  
_
Consultez vos emails Orange, Gmail, Yahoo!, Free ... directement depuis HOTMAIL 
!
http://www.windowslive.fr/hotmail/agregation/

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Marc Ghorayeb
 org.apache.solr.core.SolrResourceLoader 
 replaceClassLoader
 
 INFO: Adding 
 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/tika-parsers-0.7.jar' to 
 classloader
 
 May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
 replaceClassLoader
 
 INFO: Adding 
 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xercesImpl-2.8.1.jar' to 
 classloader
 
 May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
 replaceClassLoader
 
 INFO: Adding 
 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xml-apis-1.0.b2.jar' to 
 classloader
 
 May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader 
 replaceClassLoader
 
 INFO: Adding 
 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/xmlbeans-2.3.0.jar' to 
 classloader
 
 May 4, 2010 12:50:16 PM org.apache.solr.core.SolrResourceLoader 
 replaceClassLoader
 
 INFO: Adding 'file:/C:/apache-solr-1.4.0/dist/apache-solr-cell-1.4.0.jar' to 
 classloader
 
 May 4, 2010 12:50:20 PM org.apache.solr.core.SolrResourceLoader 
 replaceClassLoader
 
 INFO: Adding 
 'file:/C:/apache-solr-1.4.0/dist/apache-solr-clustering-1.4.0.jar' to 
 classloader
 
 May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader 
 replaceClassLoader
 
 INFO: Adding 
 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/carrot2-mini-3.1.0.jar' to 
 classloader
 
 May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader 
 replaceClassLoader
 
 INFO: Adding 
 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/commons-lang-2.4.jar' to 
 classloader
 
 May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader 
 replaceClassLoader
 
 INFO: Adding 
 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/ehcache-1.6.2.jar' to 
 classloader
 
 May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader 
 replaceClassLoader
 
 INFO: Adding 
 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/google-collections-1.0-rc2.jar'
  to classloader
 
 May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader 
 replaceClassLoader
 
 INFO: Adding 
 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/jackson-core-asl-0.9.9-6.jar'
  to classloader
 
 May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader 
 replaceClassLoader
 
 INFO: Adding 
 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/jackson-mapper-asl-0.9.9-6.jar'
  to classloader
 
 May 4, 2010 12:51:52 PM org.apache.solr.core.SolrResourceLoader 
 replaceClassLoader
 
 INFO: Adding 
 'file:/C:/apache-solr-1.4.0/contrib/clustering/lib/log4j-1.2.14.jar' to 
 classloader
 
 
 
 Thanks,
 
 Sandhya
 
 
 
 -Original Message-
 From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant Ingersoll
 Sent: Tuesday, May 04, 2010 6:13 AM
 Cc: solr-user@lucene.apache.org
 Subject: Re: Problem with pdf, upgrading Cell
 
 
 
 Little more info... Seems to be a classloading issue.  The tests pass, but 
 they aren't loading the Tika libraries via the Solr ResourceLoader, whereas 
 the example is.  Marc, one thing to try is to unjar the Solr WAR file and put 
 the Tika libs in there, as I bet it will then work.  Note, however, I haven't 
 tried this.
 
 
 
 On May 3, 2010, at 6:24 PM, Grant Ingersoll wrote:
 
 
 
  I've opened https://issues.apache.org/jira/browse/SOLR-1902 to track this.  
  It is indeed a bug somewhere (still investigating).  It seems that Tika is 
  now picking an EmptyParser implementation when trying to determine which 
  parser to use, despite the fact that it properly identifies the MIME Type.
 
 
 
  -Grant
 
 
 
  On May 3, 2010, at 5:36 PM, Grant Ingersoll wrote:
 
 
 
  I'm investigating.
 
 
 
  On May 3, 2010, at 5:17 AM, Marc Ghorayeb wrote:
 
 
 
 
 
  Hi,
 
  Grant, i confirm what Praveen has said, any PDF i try does not work with 
  the new Tika and SVN versions. :(
 
  Marc
 
 
 
  From: sagar...@opentext.com
 
  To: solr-user@lucene.apache.org
 
  Date: Mon, 3 May 2010 13:05:24 +0530
 
  Subject: RE: Problem with pdf, upgrading Cell
 
 
 
  Hello,
 
 
 
  Please let me know if anybody figured out a way out of this issue.
 
 
 
  Thanks,
 
  Sandhya
 
 
 
  -Original Message-
 
  From: Praveen Agrawal [mailto:pkal...@gmail.com]
 
  Sent: Friday, April 30, 2010 11:14 PM
 
  To: solr-user@lucene.apache.org
 
  Subject: Re: Problem with pdf, upgrading Cell
 
 
 
  Grant,
 
  You can try any of the sample pdfs that come in /docs folder of Solr 1.4
 
  dist'n. I had tried 'Installing Solr in Tomcat.pdf', 'index.pdf' etc. 
  Only
 
  metadata i.e. stream_size, content_type apart from my own literals are
 
  indexed, and content is missing..
 
 
 
 
 
  On Fri, Apr 30, 2010 at 8:52 PM, Grant Ingersoll 
  gsing...@apache.orgwrote:
 
 
 
  Praveen and Marc,
 
 
 
  Can you share the PDF (feel free to email my private email) that fails 
  in
 
  Solr?
 
 
 
  Thanks,
 
  Grant
 
 
 
 
 
  On Apr 30, 2010, at 7:55 AM, Marc Ghorayeb wrote:
 
 
 
 
 
  Hi
 
  Nope i didn't get it to work... Just like you, command line version of
 
  tika extracts correctly the content

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Marc Ghorayeb

Hey,
I got it to work. I just redid my steps, i had forgotten several libraries that 
were imported through the xml. PDF extraction seems to work once again, i have 
yet to find one that raises an exception!

Thanks for the investigation, at least we now have a fix :)
Marc  
_
Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone, 
Blackberry, …
http://www.messengersurvotremobile.com/?d=Hotmail

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Marc Ghorayeb
  was
  extracted, it shows one empty title and one mine. For pdf where it
  extracted
  only one line of content, it shows that line as title too and mine one.
  'title' field is defined as multivalue in schema.
 
  Any idea, whats going on? or am i missing something?
 
 
 
  On Tue, May 4, 2010 at 4:13 PM, Marc Ghorayeb dekay...@hotmail.com
  wrote:
 
  
   Hey,
   I got it to work. I just redid my steps, i had forgotten several
  libraries
   that were imported through the xml. PDF extraction seems to work once
  again,
   i have yet to find one that raises an exception!
  
   Thanks for the investigation, at least we now have a fix :)
   Marc
   _
   Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone,
   Blackberry, …
   http://www.messengersurvotremobile.com/?d=Hotmail
  
 
 
 
 
 
  
_
Découvrez comment SURFER DISCRETEMENT sur un site de rencontres !
http://clk.atdmt.com/FRM/go/206608211/direct/01/

RE: Problem with pdf, upgrading Cell

2010-05-03 Thread Marc Ghorayeb

Hi,
Grant, i confirm what Praveen has said, any PDF i try does not work with the 
new Tika and SVN versions. :(
Marc

 From: sagar...@opentext.com
 To: solr-user@lucene.apache.org
 Date: Mon, 3 May 2010 13:05:24 +0530
 Subject: RE: Problem with pdf, upgrading Cell
 
 Hello,
 
 Please let me know if anybody figured out a way out of this issue. 
 
 Thanks,
 Sandhya
 
 -Original Message-
 From: Praveen Agrawal [mailto:pkal...@gmail.com] 
 Sent: Friday, April 30, 2010 11:14 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Problem with pdf, upgrading Cell
 
 Grant,
 You can try any of the sample pdfs that come in /docs folder of Solr 1.4
 dist'n. I had tried 'Installing Solr in Tomcat.pdf', 'index.pdf' etc. Only
 metadata i.e. stream_size, content_type apart from my own literals are
 indexed, and content is missing..
 
 
 On Fri, Apr 30, 2010 at 8:52 PM, Grant Ingersoll gsing...@apache.orgwrote:
 
  Praveen and Marc,
 
  Can you share the PDF (feel free to email my private email) that fails in
  Solr?
 
  Thanks,
  Grant
 
 
  On Apr 30, 2010, at 7:55 AM, Marc Ghorayeb wrote:
 
  
   Hi
   Nope i didn't get it to work... Just like you, command line version of
  tika extracts correctly the content, but once included in Solr, no content
  is extracted.
   What i tried until now is:- Updating the tika libraries inside Solr 1.4
  public version, no luck there.- Downloading the latest SVN version, compiled
  it, and started from a simple schema, still no luck.- Getting other versions
  compiled on hudson (nightly builds), and testing them also, still no
  extraction.
   I sent a mail on the developpers mailing list but they told me i should
  just mail here, hope some developper reads this because it's quite an
  important feature of Solr and somehow it got broke between the 1.4 release,
  and the last version on the svn.
   Marc
   _
   Consultez gratuitement vos emails Orange, Gmail, Free, ... directement
  dans HOTMAIL !
   http://www.windowslive.fr/hotmail/agregation/
 
  --
  Grant Ingersoll
  http://www.lucidimagination.com/
 
  Search the Lucene ecosystem using Solr/Lucene:
  http://www.lucidimagination.com/search
 
 
  
_
Hotmail et MSN dans la poche? HOTMAIL et MSN sont dispo gratuitement sur votre 
téléphone!
http://www.messengersurvotremobile.com/?d=Hotmail

Re: Problem with pdf, upgrading Cell

2010-04-30 Thread Marc Ghorayeb

Hi
Nope i didn't get it to work... Just like you, command line version of tika 
extracts correctly the content, but once included in Solr, no content is 
extracted.
What i tried until now is:- Updating the tika libraries inside Solr 1.4 public 
version, no luck there.- Downloading the latest SVN version, compiled it, and 
started from a simple schema, still no luck.- Getting other versions compiled 
on hudson (nightly builds), and testing them also, still no extraction.
I sent a mail on the developpers mailing list but they told me i should just 
mail here, hope some developper reads this because it's quite an important 
feature of Solr and somehow it got broke between the 1.4 release, and the last 
version on the svn.
Marc  
_
Consultez gratuitement vos emails Orange, Gmail, Free, ... directement dans 
HOTMAIL !
http://www.windowslive.fr/hotmail/agregation/

RE: Problem with pdf, upgrading Cell

2010-04-26 Thread Marc Ghorayeb

Okay i've been digging a little bit through the Java code from the SVN, and it 
seems the load function inside the ExtractingDocumentLoader class does not 
receive the ContentStream (it is set to null...).Maybe i should send this to 
the developper mailing list?
Marc

 From: dekay...@hotmail.com
 To: solr-user@lucene.apache.org
 Subject: RE: Problem with pdf, upgrading Cell
 Date: Fri, 23 Apr 2010 16:03:28 +0200
 
 
 Seems like i'm not the only one with this no extraction 
 problem:http://www.mail-archive.com/solr-user@lucene.apache.org/msg33609.htmlApparently
  he tried the same thing, building from the trunk, and indexing a pdf, and no 
 extraction occured... Strange.
 Marc G.
 
 _
 Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone, 
 Blackberry, …
 http://www.messengersurvotremobile.com/?d=Hotmail
  
_
Découvrez comment SURFER DISCRETEMENT sur un site de rencontres !
http://clk.atdmt.com/FRM/go/206608211/direct/01/

Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb

Hello,
I configured a Solr server to be able to extract data from various documents, 
including pdfs. Unfortunately, the data extraction fails on several pdfs. I 
have read around here that this may be due to the old Tika library being used?I 
looked around and saw that the svn had a newer version so i checked out the 
trunk, and built it using ant dist, and ant example.I then set up my schema in 
the newly built server, and inserted the library from the newly built cell into 
the lib directory (in solr's home). However, now all i get is a blank 
response... The indexing works, but it doesn't extract anything, only the 
literal values that i pass on are indexed.
Any help would be greatly appreciated!! :)
Thank you.
Marc Ghorayeb 
_
Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone, 
Blackberry, …
http://www.messengersurvotremobile.com/?d=Hotmail

RE: Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb

I'm launching it with the start.jar utility, and there doesn't seem to be 
anything weird inside the console when i upload a pdf. Is there a way to output 
the console to a log file? The only log file that get's updated is a log file 
in the logs directory, and it seems to only show the input/ouput of the web 
requests (get and posts...).
for example:127.0.0.1 -  -  [23/Apr/2010:13:06:47 +] GET 
/solr/core0/admin/luke?show=schemawt=json HTTP/1.1 200 21690 127.0.0.1 -  -  
[23/Apr/2010:13:06:47 +] GET /solr/core0/admin/luke?wt=json HTTP/1.1 200 
780 127.0.0.1 -  -  [23/Apr/2010:13:06:57 +] POST 
/solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Clucidworks-solr-refguide-1.4.pdfliteral.title=lucidworks-solr-refguide-1.4.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Flucidworks-solr-refguide-1.4.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1
 HTTP/1.1 200 41 127.0.0.1 -  -  [23/Apr/2010:13:06:58 +] POST 
/solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cmysql-proxy-en.pdfliteral.title=mysql-proxy-en.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Fmysql-proxy-en.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1
 HTTP/1.1 200 44 127.0.0.1 -  -  [23/Apr/2010:13:06:59 +] POST 
/solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cpython-cheat-sheet-v1.pdfliteral.title=python-cheat-sheet-v1.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Fpython-cheat-sheet-v1.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1
 HTTP/1.1 200 44 127.0.0.1 -  -  [23/Apr/2010:13:07:00 +] POST 
/solr/core0/update HTTP/1.1 200 41 127.0.0.1 -  -  [23/Apr/2010:13:07:00 
+] POST /solr/core0/update HTTP/1.1 200 41 127.0.0.1 -  -  
[23/Apr/2010:13:07:05 +] GET /solr/core0/admin/schema.jsp HTTP/1.1 200 
26395 127.0.0.1 -  -  [23/Apr/2010:13:07:05 +] GET 
/solr/core0/admin/jquery-1.2.3.min.js HTTP/1.1 304 0 
I don't think that's going to help much :)
 Date: Fri, 23 Apr 2010 06:04:34 -0700
 From: otis_gospodne...@yahoo.com
 Subject: Re: Problem with pdf, upgrading Cell
 To: solr-user@lucene.apache.org
 
 Marc, got anything in your logs?
 
  Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/
 
 
 
 - Original Message 
  From: Marc Ghorayeb dekay...@hotmail.com
  To: solr-user@lucene.apache.org
  Sent: Fri, April 23, 2010 8:42:53 AM
  Subject: Problem with pdf, upgrading Cell
  
  
 Hello,
 I configured a Solr server to be able to extract data from various 
  documents, including pdfs. Unfortunately, the data extraction fails on 
  several 
  pdfs. I have read around here that this may be due to the old Tika library 
  being 
  used?I looked around and saw that the svn had a newer version so i checked 
  out 
  the trunk, and built it using ant dist, and ant example.I then set up my 
  schema 
  in the newly built server, and inserted the library from the newly built 
  cell 
  into the lib directory (in solr's home). However, now all i get is a blank 
  response... The indexing works, but it doesn't extract anything, only the 
  literal values that i pass on are indexed.
 Any help would be greatly 
  appreciated!! :)
 Thank you.
 Marc Ghorayeb 
  

  
 _
 Hotmail 
  arrive sur votre téléphone ! Compatible Iphone, Windows Phone, Blackberry, 
  …
 
  http://www.messengersurvotremobile.com/?d=Hotmail
 
  
_
Consultez gratuitement vos emails Orange, Gmail, Free, ... directement dans 
HOTMAIL !
http://www.windowslive.fr/hotmail/agregation/

RE: Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb

Seems like i'm not the only one with this no extraction 
problem:http://www.mail-archive.com/solr-user@lucene.apache.org/msg33609.htmlApparently
 he tried the same thing, building from the trunk, and indexing a pdf, and no 
extraction occured... Strange.
Marc G.

 From: dekay...@hotmail.com
 To: solr-user@lucene.apache.org
 Subject: RE: Problem with pdf, upgrading Cell
 Date: Fri, 23 Apr 2010 15:12:39 +0200
 
 
 I'm launching it with the start.jar utility, and there doesn't seem to be 
 anything weird inside the console when i upload a pdf. Is there a way to 
 output the console to a log file? The only log file that get's updated is a 
 log file in the logs directory, and it seems to only show the input/ouput of 
 the web requests (get and posts...).
 for example:127.0.0.1 -  -  [23/Apr/2010:13:06:47 +] GET 
 /solr/core0/admin/luke?show=schemawt=json HTTP/1.1 200 21690 127.0.0.1 -  - 
  [23/Apr/2010:13:06:47 +] GET /solr/core0/admin/luke?wt=json HTTP/1.1 
 200 780 127.0.0.1 -  -  [23/Apr/2010:13:06:57 +] POST 
 /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Clucidworks-solr-refguide-1.4.pdfliteral.title=lucidworks-solr-refguide-1.4.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Flucidworks-solr-refguide-1.4.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1
  HTTP/1.1 200 41 127.0.0.1 -  -  [23/Apr/2010:13:06:58 +] POST 
 /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cmysql-proxy-en.pdfliteral.title=mysql-proxy-en.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Fmysql-proxy-en.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1
  HTTP/1.1 200 44 127.0.0.1 -  -  [23/Apr/2010:13:06:59 +] POST 
 /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cpython-cheat-sheet-v1.pdfliteral.title=python-cheat-sheet-v1.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Fpython-cheat-sheet-v1.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1
  HTTP/1.1 200 44 127.0.0.1 -  -  [23/Apr/2010:13:07:00 +] POST 
 /solr/core0/update HTTP/1.1 200 41 127.0.0.1 -  -  [23/Apr/2010:13:07:00 
 +] POST /solr/core0/update HTTP/1.1 200 41 127.0.0.1 -  -  
 [23/Apr/2010:13:07:05 +] GET /solr/core0/admin/schema.jsp HTTP/1.1 200 
 26395 127.0.0.1 -  -  [23/Apr/2010:13:07:05 +] GET 
 /solr/core0/admin/jquery-1.2.3.min.js HTTP/1.1 304 0 
 I don't think that's going to help much :)
  Date: Fri, 23 Apr 2010 06:04:34 -0700
  From: otis_gospodne...@yahoo.com
  Subject: Re: Problem with pdf, upgrading Cell
  To: solr-user@lucene.apache.org
  
  Marc, got anything in your logs?
  
   Otis
  
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
  Lucene ecosystem search :: http://search-lucene.com/
  
  
  
  - Original Message 
   From: Marc Ghorayeb dekay...@hotmail.com
   To: solr-user@lucene.apache.org
   Sent: Fri, April 23, 2010 8:42:53 AM
   Subject: Problem with pdf, upgrading Cell
   
   
  Hello,
  I configured a Solr server to be able to extract data from various 
   documents, including pdfs. Unfortunately, the data extraction fails on 
   several 
   pdfs. I have read around here that this may be due to the old Tika 
   library being 
   used?I looked around and saw that the svn had a newer version so i 
   checked out 
   the trunk, and built it using ant dist, and ant example.I then set up my 
   schema 
   in the newly built server, and inserted the library from the newly built 
   cell 
   into the lib directory (in solr's home). However, now all i get is a 
   blank 
   response... The indexing works, but it doesn't extract anything, only the 
   literal values that i pass on are indexed.
  Any help would be greatly 
   appreciated!! :)
  Thank you.
  Marc Ghorayeb 
   
 
   
  _
  Hotmail 
   arrive sur votre téléphone ! Compatible Iphone, Windows Phone, 
   Blackberry, 
   …
  
   http://www.messengersurvotremobile.com/?d=Hotmail
  
 
 _
 Consultez gratuitement vos emails Orange, Gmail, Free, ... directement dans 
 HOTMAIL !
 http://www.windowslive.fr/hotmail/agregation

RE: Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb

Seems like i'm not the only one with this no extraction 
problem:http://www.mail-archive.com/solr-user@lucene.apache.org/msg33609.htmlApparently
 he tried the same thing, building from the trunk, and indexing a pdf, and no 
extraction occured... Strange.
Marc G.
  
_
Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone, 
Blackberry, …
http://www.messengersurvotremobile.com/?d=Hotmail

RE: Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb
 PM org.apache.solr.search.SolrIndexSearcher warmINFO: 
autowarming result for searc...@105585dc main
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}Apr
 23, 2010 5:47:14 PM org.apache.solr.search.SolrIndexSearcher warmINFO: 
autowarming searc...@105585dc main from searc...@2efeecca main
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}Apr
 23, 2010 5:47:14 PM org.apache.solr.search.SolrIndexSearcher warmINFO: 
autowarming result for searc...@105585dc main
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}Apr
 23, 2010 5:47:14 PM org.apache.solr.search.SolrIndexSearcher warmINFO: 
autowarming searc...@105585dc main from searc...@2efeecca main
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}Apr
 23, 2010 5:47:14 PM org.apache.solr.search.SolrIndexSearcher warmINFO: 
autowarming result for searc...@105585dc main
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}Apr
 23, 2010 5:47:14 PM org.apache.solr.core.QuerySenderListener newSearcherINFO: 
QuerySenderListener sending requests to searc...@105585dc mainApr 23, 2010 
5:47:14 PM org.apache.solr.core.QuerySenderListener newSearcherINFO: 
QuerySenderListener done.Apr 23, 2010 5:47:14 PM org.apache.solr.core.SolrCore 
registerSearcherINFO: [] Registered new searcher searc...@105585dc mainApr 23, 
2010 5:47:14 PM org.apache.solr.search.SolrIndexSearcher closeINFO: Closing 
searc...@2efeecca main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}Apr
 23, 2010 5:47:14 PM org.apache.solr.update.processor.LogUpdateProcessor 
finishINFO: {optimize=} 0 46Apr 23, 2010 5:47:14 PM 
org.apache.solr.core.SolrCore executeINFO: [] webapp=/solr path=/update 
params={optimize=truewaitSearcher=truemaxSegments=1waitFlush=truewt=javabinversion=1}
 status=0 QTime=46
 Date: Fri, 23 Apr 2010 08:03:14 -0700
 From: otis_gospodne...@yahoo.com
 Subject: Re: Problem with pdf, upgrading Cell
 To: solr-user@lucene.apache.org
 
 Marc,
 
 These are your request logs.  You want to look at your Solr logs.
 
  Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/
 
 
 
 - Original Message 
  From: Marc Ghorayeb dekay...@hotmail.com
  To: solr-user@lucene.apache.org
  Sent: Fri, April 23, 2010 9:12:39 AM
  Subject: RE: Problem with pdf, upgrading Cell
  
  
 I'm launching it with the start.jar utility, and there doesn't seem to be 
  anything weird inside the console when i upload a pdf. Is there a way to 
  output 
  the console to a log file? The only log file that get's updated is a log 
  file in 
  the logs directory, and it seems to only show the input/ouput of the web 
  requests (get and posts...).
 for example:127.0.0.1 -  -  
  [23/Apr/2010:13:06:47 +] GET 
  /solr/core0/admin/luke?show=schemawt=json 
  HTTP/1.1 200 21690 127.0.0.1 -  -  [23/Apr/2010:13:06:47 +] GET 
  /solr/core0/admin/luke?wt=json HTTP/1.1 200 780 127.0.0.1 -  -  
  [23/Apr/2010:13:06:57 +] POST 
  /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Clucidworks-solr-refguide-1.4.pdfliteral.title=lucidworks-solr-refguide-1.4.pdfliteral.url=http%3A%2F%2Fwww.3ds.com%2Flucidworks-solr-refguide-1.4.pdfliteral.appKey=medialiteral.type=documentliteral.siteHash=53e446a6b81860dcfa1cc2fef4ef976bliteral.group=portalliteral.group=varliteral.group=0literal.group=caa_goldliteral.group=caa_partnerliteral.group=ag12literal.group=ag17wt=javabinversion=1
   
  HTTP/1.1 200